Manipulating Hierarchies with UDFs
Exploit the power of UDFs for managing hierarchical data
December 19, 2000
Editor's Note: Send your experts-only T-SQL tips to Itzik Ben-Gan at [email protected]. If we use your tip in the magazine, you'll receive $100 and an exclusive T-SQL Black Belt shirt!
Several recent SQL Server Magazine articles have addressed hierarchical structures in a database environment. In "Maintaining Hierarchies," July 2000, I discussed solutions for managing hierarchical data and showed how to manipulate hierarchical data by adding columns to a table. In "Expanding Hierarchies," Muhammad Nadeem Akhter showed how to manipulate hierarchical data by using stored procedures and gave you a glimpse of how to use user-defined functions (UDFs) to manipulate hierarchies in SQL Server 2000. In this article, I show you how to use UDFs in hierarchical environments and how powerful UDFs can be for enhancing hierarchical data-manipulation capabilities. (For more information about UDFs, see Robert Vieira, "User-Defined Functions," November 2000.) In my examples, I use an Employees table in which each employee has an attribute called mgrid, which holds the manager ID of the employee. Each manager is also an employee, which creates hierarchical relationships. Listing 1 shows the script that creates and populates the Employees table.
Getting Ancestors
Let's jump right into the first example—retrieving a manager in a requested level. For example, let's say that you want to return the manager who is two levels above a particular employee. You can approach this task in two ways: by using recursion or by using loops. Let's start with the recursive solution. Essentially, recursion means that a routine calls itself. To avoid a situation in which a routine infinitely calls itself, you must always use recursion with a proper termination check. Correctly used, recursion can solve problems that would otherwise be too complex to solve. (The sidebar "Using Recursion to Solve the Hanoi Towers Puzzle," gives an example of a problem that recursion can solve.) However, in many cases, you can simply use loops instead of recursion. Using loops is often a better choice because recursive calls can consume more resources than loops.
Listing 2 shows the dbo.ufn_GetAncestor function, which uses recursion to return the correct manager. The dbo.ufn_GetAncestor function gets two arguments: @empid is the employee ID, and @lvl is a value that specifies how many levels above an employee the manager's level is. First, the function performs some safety checks to make sure that the supplied arguments are valid. If the arguments aren't valid, the function returns NULL. Next, the function performs the recursion termination check: If the argument supplied as the level is 0, the function simply returns the employee ID, which was also provided as an argument. Last, the function executes the recursive call, which is amazingly simple—the function calls itself and passes as arguments the manager of the specified employee and the level -1. In essence, the function is asking for the ancestor of the manager of the specified employee that is n -1 levels above the manager, where n is the level you supplied. The recursion aborts as soon as the level drops to zero or a manager in a higher level can't be found. In the latter case, the function returns NULL to show that the request can't be fulfilled.
Recursion has two limitations. One limitation is general and applies to recursion in most environments; the other is specific to SQL Server. The general limitation is recursion's significant resource consumption. Each function holds memory structures that contain the function's code and that keep track of the function's variables. As recursion continues, several occurrences of the function are active and this behavior consumes resources.
The SQL Server-specific limitation is SQL Server's design specification that limits the number of supported nesting levels to 32. This specification gives adequate flexibility in performing recursion and eliminates the possibility of infinite recursive calls in poorly written routines—for example, those with no termination checks. Therefore, the above function supports only requests for managers who are within 32 levels above the employee provided as an argument. In an organizational tree, you rarely need to support more than 32 levels. But if you're working with an organizational tree that has more than 32 levels, you have to provide a solution that doesn't involve recursion. In my example, I provide a simple, nonrecursive solution by using a loop. Listing 3 shows the code for the revised function. In each iteration, I retrieve the manager of the employee and decrement the @lvl argument. After the current level drops to zero, the function is complete.
You can now test either function. Both return the same result if you provide the same arguments. For example, to retrieve the manager who is 2 levels above David (whose empid is 11), you would execute the following statement:
SELECT dbo.ufn_GetAncestor(11, 2)
This statement returns the employee ID of Janet, which is 3. If you want to retrieve the full details of the manager who is 2 levels above David, you would issue the following query:
SELECT * FROM Employees WHERE empid =dbo.ufn_GetAncestor(11, 2)
To get all the employees and their managers who are two levels above, you can issue the following query:
SELECT E.empname AS employee, A.empname AS ancestorFROM Employees AS E LEFT OUTER JOIN Employees AS A ON A.empid = dbo.ufn_GetAncestor(E.empid, 2)
This query performs a self-join between the Employees table and itself. The join condition ensures that each employee is matched with her manager, who is two levels above. I used a LEFT OUTER JOIN in this query to ensure that all employees—including those who don't have a manager who is two levels above—are included in the result.
Calculating Aggregates and Subtree Depth
Now for the neat stuff—problems that aren't easy to solve without recursion. Suppose you want to get the sum of all salaries of employees in a certain subtree, starting with a given manager. Listing 4 shows how you can use the dbo.ufn_GetSubtreeSalary function to perform this task. Note that the function is surprisingly short, considering the complex task at hand; it contains only a RETURN clause. The function asks for the manager's salary plus the sum of the salaries of the subtrees of each manager's direct subordinates. Now, try to use this new function to calculate the total salary for the subtree that starts with Janet (whose empid is 3), and you'll get 20000:
SELECT dbo.ufn_GetSubtreeSalary(3)
You can similarly calculate the depth of a subtree, as Listing 5 shows. The dbo.ufn_GetSubtreeDepth function also returns the result of a CASE expression, but the function is slightly different in this listing. The CASE expression first checks whether the given manager has subordinates. If so, the function returns 1 plus the maximum depth of all the manager's direct subordinates—hence the recursion. If the given manager has no subordinates, the CASE expression determines whether that manager exists. If she does, the function returns 1 (a manager with no subordinates is 1 level deep). Finally, if the manager doesn't exist, the CASE expression returns NULL. Now, try this function to calculate the depth of the whole tree by supplying Nancy's employee ID (Nancy is the big boss) as the argument, and you'll get 5, which means that the tree is 5 levels deep.
SELECT dbo.ufn_GetSubtreeDepth(1)
Getting a Chain of Management
Until now, these examples have used scalar UDFs—UDFs that return a single value. Now, let's see how to use table-valued UDFs, which return a rowset or a table (i.e., for use in the FROM clause). For example, a common requirement when working with hierarchical structures is to return a whole subtree starting with a given manager. Listing 6 shows the script that creates the ufn_GetSubtree function. Note that the returned table has the same structure as the original Employees table, with two additional columns: lvl and path. The lvl column holds the level in the subtree starting with 0, and the path column holds the path of the employee in the form .id0.id1...idn. This string contains all the employee IDs, starting with the top-level employee in the subtree and ending with the current employee. All the employee IDs in the path are preceded and followed by dots.
The path column allows proper sorting of the subtree rows that the dbo.ufn_GetSubtree function in my example returns. Because the path values of all subordinates of a particular employee are prefixed with their managers' paths, the values are sorted right after their manager. The function first inserts into the @tree table variable the manager's row, which you provided as an argument. Next, the function starts a loop, which iterates as long as the previous insert operation has result rows. The code in the loop appends the direct subordinates of the previous insert operation to the @tree table variable. You can identify the level of the previously inserted employees by keeping the current level of the employees in the subtree in the variable @lvl.
Now you can test the ufn_GetSubtree function. To get details about Andrew (whose empid is 2) along with details about his subordinates in all levels, issue the following query:
SELECT * FROM ufn_GetSubtree(2)ORDER BY path
To get a hierarchical depiction of all the employees in the Employees table, issue the following query:
SELECT REPLICATE (' | ', lvl) + empname AS employeeFROM ufn_GetSubtree(1)ORDER BY path
Figure 1 shows the results. To get the leaf nodes (employees who aren't managers) in Andrew's subtree, you issue the following query:
SELECT * FROM ufn_GetSubtree(2) AS SWHERE NOT EXISTS(SELECT * FROM Employees AS EWHERE E.mgrid = S.empid)
The last task is to report the chain of management leading to a particular employee by using the ufn_GetMgmtChain function, which Listing 7 shows. Note that you implement the ufn_GetMgmtChain function in a way similar to the ufn_GetSubtree function, except that you go up the chain until you reach the top. To get the management chain leading to James (whose empid is 14), you can issue the following query:
SELECT * FROM ufn_GetMgmtChain(14)ORDER BY lvl DESC
In "Maintaining Hierarchies," I used the same lvl and path columns that I mention in this article, but I added them to the original Employees table and maintained their values with triggers. In terms of the SELECT queries' performance, adding such columns to the Employees table is more efficient than calculating the values of lvl and path in a UDF. The ufn_GetSubtree function needs to scan the whole table to calculate the values of lvl and path for each row before you can apply a filter that uses those columns. The solution that holds lvl and path as columns in the Employees table can exploit an index placed on the path column, thus not requiring a full table scan. However, using the ufn_GetSubtree function doesn't require any schema changes to the Employees table, and modifications to the Employees table don't incur the additional cost of maintaining the lvl and path columns, as in the triggers solution. The advent of UDFs has opened endless programmatic possibilities. UDFs are powerful tools for manipulating hierarchical data without having to alter a table's structure.
About the Author
You May Also Like