CTEs with Multiple Recursive Members
Learn about genealogy-related queries and the nested set model for representing graphs
January 22, 2013
Common table expressions (CTEs) were introduced in SQL Server 2005. I covered both non-recursive and recursive CTEs almost a decade ago in "Get in the Loop with CTEs."
As a reminder, a recursive CTE has at minimum one anchor member and one recursive member, using the following form:
WITH AS(UNION ALL)SELECT ...FROM ;
The anchor member is a query without a recursive reference to the CTE name; it gets invoked only once. The recursive member is a query with a recursive reference to the CTE name; it gets invoked repeatedly until it returns an empty set. The reference to the CTE name in the recursive member represents the previous result set. The outer reference to the CTE name represents the unified results of the single invocation of the anchor member and the multiple invocations of the recursive member. For more details about the fundamentals of recursive queries and examples, see "Get in the Loop with CTEs."
I've written a lot of recursive CTEs since their inception in SQL Server. But recently it occurred to me that all the CTEs I'd written had a single recursive member. Knowing that CTEs support multiple recursive members, I went in search of such examples. I found several interesting use cases that I'll share with you in this article. I'll cover two examples: genealogy and the nested set model for representing graphs. Next month, I'll provide another example.
Genealogy
Genealogy-related queries can be implemented using CTEs with multiple recursive members. For sample data, I'll use part of Bilbo Baggins' family tree. Run the code in Listing 1 (below) to create the table FamilyTree and fill it with sample data. (For a more complete family tree for Bilbo, see en.wikipedia.org/wiki/Bilbo_Baggins#Family_tree.)
Your task is to implement a solution that returns all ancestors of a given family member. For example, given as input @id = 16 (ID of Frodo Baggins), return the ID, name, and level (distance from input) of all ancestors of the input member. Figure 1 shows the desired output for Frodo.
You can implement a solution for this task by using a CTE with two anchor members and two recursive members. Remember that anchor members get invoked only once. One anchor member returns the mother of the input and another anchor member returns the father of the input. Remember that recursive members get invoked repeatedly until they return an empty set. The reference to the CTE name in the recursive member represents the previous result set. So you use one recursive member to return mothers of family members from the previous round and another recursive member to return fathers of family members from the previous round. In pseudo code, the solution CTE looks like this:
WITH C AS(anchor member returning mother of input family memberUNION ALLanchor member returning father of input family memberUNION ALLrecursive member returning mothers of family members fromprevious roundUNION ALLrecursive member returning fathers of family members fromprevious round)query C;
Listing 2 (below) contains the T-SQL solution code that implements this logic. Run the code in Listing 2 with Frodo's ID as input (16); you'll get the output shown in Figure 1.
[Figure 1 goes here: BenGanSQL555Fig1]
Note that it's possible to implement a solution to this task using a CTE with a single anchor member and a single recursive member. However, such a solution is less natural and more complex. Listing 3 (below) contains an example of such a solution. This solution uses a trick that prevents the necessity of two anchor members (for mother and father) and two recursive members (for mothers and fathers). Both anchor and recursive queries perform a cross join between the source data and a derived table that's made of two rows -- one representing the father (parent = 'F') and one representing the mother (parent = 'M'). Then, a CROSS APPLY operator uses a CASE expression to determine which column to return, based on the parental role; when parent = 'F' the CASE expression returns the father, and when parent = 'M' it returns the mother.
Again, this solution shows that it's possible to complete the task using a CTE with one anchor member and one recursive member. However, it's simpler and more natural to do so with multiple anchors and multiple recursive members.
Nested Sets
The nested set is a model for representing graphs that's based on graph theory. It works well in certain scenarios -- mainly, for the special case of a graph called tree. A tree is a graph that's directed, acyclic, and where only one path leads to each node.
In the nested set model you assign two integer values called left and right to each node, such that the left value of the node is greater than the left value of its parent node and the right value of the node is less than the right value of its parent node. You rely on the left and right values to address common requests against graphs. For example, to return all descendants of a given node you use a query that returns all nodes with a left value that's greater than the left value of the input node and with a right value that's less than the right value of the input node. To return all ancestors of a given node, you use a query that returns all nodes with a left value that's less than the left value of the input node and with a right value that's greater than the right value of the input node. (For more details about the nested set model, see en.wikipedia.org/wiki/Nested_set_model.)
In this article, I demonstrate how to compute the left and right values efficiently by using a CTE with multiple recursive members. As sample data, I use a table called Employees. You can use the code in Listing 4 (below) to create and populate the Employees table.
As you can see, the Employees table represents the tree of employees as an adjacency list. The empid and mgrid columns represent the child and parent IDs of the nodes in the tree, respectively. The task at hand is to query the Employees table and compute the left and right values for each node. Figure 2 is a graphical depiction of the tree of employees, including the left and right values that need to be computed.
Figure 2: Nested Set Tree
In terms of order among siblings, in this example the order is based on empname and empid as a tiebreaker. Therefore, for example, Seraph appears to the right of Jiru, and Steve appears to the right of Seraph.
Figure 3 shows another graphical depiction of the nested set representation of the employees. Each node spreads two arms around the contained nodes. After arranging the nodes in order based on the containment relationship (with siblings ordered by empname), you assign row numbers to the arms in sequential order from left to right. The row numbers you assign will become the left and right values of each node. This idea is the basis for computing the left and right values using a CTE with multiple recursive members.
Figure 3: Nested Set Arms
Using pseudo code, Listing 5 (below) shows how you compute the left and right values. Listing 6 contains the T-SQL solution code that implements this pseudo code. The solution first defines a CTE called EmpsRN. The inner query uses the ROW_NUMBER function to compute numbers (call the column n), starting with 1 and incrementing by 2 (n = 1, 3, 5, . . . ). These numbers will be used to represent the left arms of the nodes below their respective parent. The gaps will be filled by the right arms, which will be computed as n + 1.
The key to computing the left and right values is constructing the binary sortpath values. The CTE C1 generates two rows for each node representing the left and right arms. It starts with anchor members that return the left and right arms of the root node. The anchor members assign the sort path 0x01 to the left arm and the sort path 0x02 to the right arm. Then the recursive members join the left arms of the members from the previous round to the EmpsRN CTE to return children. One recursive member generates left arms for the children, and the other recursive member generates right arms. The recursive member handling the left arms concatenates the sortpath value of the parent's left arm with n (converted to binary). The recursive member handling the right arms concatenates the sortpath value of the parent's left arm with n + 1 (converted to binary). Note that in this solution I used one byte in sortpath for each node, assuming there are no more than 255 direct subordinates to each manager. If there can be more, you should use more bytes accordingly.
Next, the query defining the CTE C2 queries C1 and computes row numbers based on sortpath ordering, naming the result column sortval. The values in the sortval column will eventually become the left and right values of the nodes. The values in the sortval column correspond to the numbers assigned to the arms in Figure 3. Figure 4 shows the result of the query defining C2, with lines added to connect the left and right arms of each node.
[Figure 4 goes here: Sort Paths and Sort Values]
Finally, the outer query queries the CTE C2, groups the rows by empid, and returns the minimum sortval and maximum sortval as the left and right values, respectively. The code in Listing 6 generates the desired output shown in Figure 5.
[Figure 5 goes here: Left and Right Values]
Two Examples Down, One to Go
In this article, I covered two examples of CTEs with multiple recursive members: one for genealogy-related queries and one to compute left and right values for the nested set model for graphs. Next month, I'll continue with this topic and provide another example.
Also, see my article "Cycling with CTEs."
Listing 1: Code to Create and Populate the FamilyTree Table
SET NOCOUNT ON;USE tempdb;IF OBJECT_ID('dbo.FamilyTree','U') IS NOT NULL DROP TABLE dbo.FamilyTree;GOCREATE TABLE dbo.FamilyTree( id INT NOT NULL CONSTRAINT PK_FamilyTree PRIMARY KEY, name varchar(30) NOT NULL, mother INT NULL, father INT NULL);INSERT INTO dbo.FamilyTree(id, name, mother, father) VALUES( 1, 'Balbo Baggins' , NULL, NULL), ( 2, 'Berylla Boffin' , NULL, NULL), ( 3, 'Mungo Baggins' , 1, 2), ( 4, 'Laura Grubb' , NULL, NULL), ( 5, 'Bungo Baggins' , 3, 4), ( 6, 'Belladonna Took' , NULL, NULL), ( 7, 'Bilbo Baggins' , 5, 6), ( 8, 'Largo Baggins' , 1, 2), ( 9, 'Tanta Hornblower' , NULL, NULL), (10, 'Fosco Baggins' , 8, 9), (11, 'Ruby Bolger' , NULL, NULL), (12, 'Dora Baggins' , 10, 11), (13, 'Drogo Baggins' , 10, 11), (14, 'Dudo Baggins' , 10, 11), (15, 'Primula Brandybuck', NULL, NULL), (16, 'Frodo Baggins' , 13, 15);
Listing 2: Find Ancestors Using a CTE with Multiple Recursive Members
DECLARE @id AS INT = 16;WITH C AS( -- anchor queries -- father of input SELECT father AS id, 1 AS lvl FROM dbo.FamilyTree WHERE id = @idAND father IS NOT NULL UNION ALL -- mother of input SELECT mother AS id, 1 AS lvl FROM dbo.FamilyTree WHERE id = @idAND mother IS NOT NULL UNION ALL -- recursive queries -- fathers of those in previous level SELECT father AS id, C.lvl + 1 AS lvl FROM CJOIN dbo.FamilyTree AS P ON C.id = P.id WHERE P.father IS NOT NULL UNION ALL -- fathers of those in previous level SELECT mother AS id, C.lvl + 1 AS lvl FROM CJOIN dbo.FamilyTree AS P ON C.id = P.id WHERE P.mother IS NOT NULL)SELECT C.id, FT.name, lvlFROM C JOIN dbo.FamilyTree AS FTON C.id = FT.id;
Listing 3: Find Ancestors Using a CTE with a Single Recursive Member
DECLARE @id AS INT = 16;WITH C AS( -- parents of input SELECT parent AS id, 1 AS lvl FROM dbo.FamilyTree AS FTCROSS JOIN (VALUES('F'),('M')) AS A1(gender)CROSS APPLY (SELECT CASE A1.gender WHEN 'F' THEN FT.father WHEN 'M' THEN FT.mother END) AS A2(parent) WHERE id = @idAND parent IS NOT NULL UNION ALL -- parents of those in previous level SELECT parent AS id, C.lvl + 1 AS lvl FROM CJOIN dbo.FamilyTree AS P ON C.id = P.idCROSS JOIN (VALUES('F'),('M')) AS A1(gender)CROSS APPLY (SELECT CASE A1.gender WHEN 'F' THEN P.father WHEN 'M' THEN P.mother END) AS A2(parent) WHERE parent IS NOT NULL)SELECT C.id, FT.name, lvlFROM C JOIN dbo.FamilyTree AS FTON C.id = FT.id;
Listing 4: DDL and Sample Data for the Employees Table
SET NOCOUNT ON;USE tempdb;IF OBJECT_ID('dbo.Employees') IS NOT NULL DROP TABLE dbo.Employees;CREATE TABLE dbo.Employees( empid INT NOT NULL PRIMARY KEY, mgrid INT NULL REFERENCES dbo.Employees, empname VARCHAR(25) NOT NULL, salary MONEY NOT NULL, CHECK (empid <> mgrid));INSERT INTO dbo.Employees(empid, mgrid, empname, salary) VALUES(1, NULL, 'David', $10000.00), (2, 1, 'Eitan', $7000.00), (3, 1, 'Ina', $7500.00), (4, 2, 'Seraph', $5000.00), (5, 2, 'Jiru', $5500.00), (6, 2, 'Steve', $4500.00), (7, 3, 'Aaron', $5000.00), (8, 5, 'Lilach', $3500.00), (9, 7, 'Rita', $3000.00), (10, 5, 'Sean', $3000.00), (11, 7, 'Gabriel', $3000.00), (12, 9, 'Emilia' , $2000.00), (13, 9, 'Michael', $2000.00), (14, 9, 'Didi', $1500.00);CREATE UNIQUE INDEX idx_unc_mgrid_empid ON dbo.Employees(mgrid, empid);
Listing 5: Code to Compute Left and Right Values
WITH EmpsRN AS( query Employees and compute a row number (call it rownum)partitioned by mgrid ordered by empname, empid let n = rownum * 2 - 1 -- n starts with 1 and increments by 2 (1, 3, 5, 7, ...)),C1 AS( anchor member returning root's left arm; let sortpath = 0x01 UNION ALL anchor member returning root's right arm; let sortpath = 0x02 UNION ALL recursive member returning left arms of children of left members of previous round;let sortpath = + UNION ALL recursive member returning right arms of children of left members of previous round;let sortpath = + <(n+1) converted to binary>),C2 AS( query C1 and compute row numbers (sortval) ordered by sortpath)query C2, group the rows by empid;let lft = minimum sortval, rgt = maximum sortval
Listing 6: Code to Compute Left and Right Values in the Nested Set Model
WITH EmpsRN AS( SELECT *,ROW_NUMBER() OVER(PARTITION BY mgrid ORDER BY empname, empid) * 2 - 1 AS n FROM dbo.Employees),C1 AS( -- root's left arm SELECT empid, 1 AS arm, CAST(0x01 AS VARBINARY(8000)) AS sortpath FROM dbo.Employees WHERE mgrid is NULL UNION ALL -- root's right arm SELECT empid, 2 AS arm, CAST(0x02 AS VARBINARY(8000)) AS sortpath FROM dbo.Employees WHERE mgrid is NULL UNION ALL SELECT E.empid, 1 AS arm,CAST(M.sortpath + CAST(E.n AS BINARY(1)) AS VARBINARY(8000)) AS sortpath FROM C1 AS MINNER JOIN EmpsRN AS E ON E.mgrid = M.empid WHERE M.arm = 1 UNION ALL SELECT E.empid, 2 AS arm, CAST(M.sortpath + CAST(E.n + 1 AS BINARY(1)) AS VARBINARY(8000)) AS sortpath FROM C1 AS MINNER JOIN EmpsRN AS E ON E.mgrid = M.empid WHERE M.arm = 1),c2 AS( SELECT empid, ROW_NUMBER() OVER(ORDER BY sortpath) AS sortval FROM C1)SELECT empid, MIN(sortval) AS lft, MAX(sortval) AS rgtFROM c2GROUP BY empid;
About the Author
You May Also Like