Evaluate Logical Expressions Using Recursive CTEs and Reverse Polish Notation
A powerful combination
December 20, 2013
Suppose that you work for ABC Solutions, which has a set of business rules it uses to determine whether an employee is authorized to execute a particular operation. You've been tasked with evaluating the business rules for each employee. The database table named Employee stores the IDs of the employees, along with the attributes used to describe them. Each business rule is expressed through a Boolean proposition that involves the employee attributes. The proposition is codified as the number of rows stored in the database table named Rules.
Related: CTEs with Multiple Recursive Members
You can evaluate the business rules for each employee by running a T-SQL query against the Employee and Rules tables, without placing limits on proposition complexity. To do so, you need to use two components: recursive common table expressions (CTEs) and Reverse Polish notation (RPN).
Exploring Recursive CTEs
You can think of a CTE as a temporary result set defined within a T-SQL statement. Once defined, you can further reference the CTE in the statement. When the CTE definition references itself, you're in the presence of recursive CTE.
For example, the code in Listing 1 demonstrates how a recursive CTE works.
USE AdventureWorksDW2012GOWITH Emp_CTE AS ( -- Anchor SELECT EmployeeKey, FirstName, LastName, 1 as EmpLevel FROM dbo.DimEmployee WHERE ParentEmployeeKey IS NULL UNION ALL -- Recursive member SELECT e.EmployeeKey, e.FirstName, e.LastName, ecte.EmpLevel + 1 FROM dbo.DimEmployee e INNER JOIN Emp_CTE ecte ON ecte.EmployeeKey = e.ParentEmployeeKey)SELECT FirstName, LastName, EmpLevelFROM Emp_CTE
In this code, which uses the data from the AdventureWorksDW2012 database, the CTE is named Emp_CTE. It contains two SELECT statements. The first SELECT statement, generally known as the anchor, provides the base elements of the final result set. In this case, the base elements are employees who don't have hierarchy parents. The second SELECT statement, generally known as the recursive member, references the CTE itself. This statement joins the CTE (which contains the base elements) with the Employee table. The resulting set contains all employees having their parent in Emp_CTE. The union of these employees with first base set will build a new base set, extending the old one, ready to be used as input for the next recursive iteration. By iterating the recursive step, the entire Employee hierarchy will be explored, showing the power of recursive CTEs in dealing with hierarchical data.
Exploring the Input Tables at ABC Solutions
Now that you know how recursive CTEs work, let's look at the input tables that will be used for the recursive CTEs at ABC Solutions. As Listing 2 shows, the Employee table simply contains some employees described using the Title and Department attributes.
CREATE TABLE [dbo].[Employee]( [EmpId] [int] NOT NULL, [Title] [nvarchar](50) NULL, [Department] [nvarchar](50) NULL,CONSTRAINT [PK_Emp] PRIMARY KEY CLUSTERED ([EmpId] ASC))
With those attributes in mind, consider the Boolean expression:
((Department='Dep1') or (Department like 'Dep2%')) and (Title='d')
This expression contains three atomic propositions—(Department='Dep1'), (Department like 'Dep2%'), and (Title='d')—that will be named p1, p2, and p3, respectively. The expression also contains a complex proposition—(p1 or p2)—that will be named p4. Finally, the expression contains a root proposition, p5, defined as (p4 and p3). Without loss of generality, the expression considers only binary operators. (For example, the AND of the three propositions could be decomposed using the associative property of the AND operator.)
Taking all this into account, the Rules table is defined. As Listing 3 shows, the Rules table defines a hierarchical binary tree, where leaf nodes correspond to the atomic propositions. The parent nodes correspond to complex propositions built through logical AND, OR, and NOT operations involving their children. So, it's possible to label each parent node of the tree with a Boolean value, where the label is obtained by applying the node operator on the Boolean labels of the children. (The root label corresponds to the whole expression evaluation.)
CREATE TABLE [dbo].[Rules]( [RuleId] [int] NOT NULL, [Description] [nvarchar](50), [IsAtomic] [bit] NULL -- IsAtomic = 1 for atomic propositions -- Example: Department = 'Dep1') -- IsAtomic = 0 for complex propositions -- Example: (p4 AND p3) [FieldSelector] [int] NULL, -- FieldSelector = NULL for complex propositions -- FieldSelector = 0 selection of field Department -- Example: Department = 'Dep1'-->FieldSelector=0 -- FieldSelector = 1 selection of field Title -- Example: Title = 'd'-->FieldSelector=1 [Operator] [int] NULL, -- In complex propositions, Operator enumerates -- logical operators (0 NOT, 1 AND, 2 OR) -- In atomic propositions, Operator enumerates -- comparison operators (0 =, 1 LIKE) [Expression] [nvarchar](max) NULL, -- Expression value is taken into account only for -- atomic propositions -- Example: Department like 'Dep2%'-->FieldSelector=1, -- Operator=1, Expression='Dep2%' [LeftRule] [int] NULL, -- In complex propositions, LeftRule matches with the -- RuleId of the first child proposition [RightRule] [int] NULL, -- In complex propositions, RightRule matches with the -- RuleId of the second child propositionCONSTRAINT [FK_Rules] PRIMARY KEY CLUSTERED ([RuleId] ASC))
Considering this possibility, your first inclination might be to use the recursive CTE for expression evaluation. After all, it makes sense to define an anchor containing leaf nodes of the Rules table, followed by a recursive member that joins the CTE with the parent nodes. Each step of the recursion would evaluate the Boolean operator of the parent nodes applied to their children. To clarify the key points of this solution, consider this code:
WITH RecursiveCTETree(Evaluation,RuleId)( SELECT label(RuleId), RuleId from Rules WHERE is_leaf(RuleId)=true UNION ALL SELECT evaluate(parent.Operator,leftchild.Evaluation, rightchild.Evaluation), parent.RuleId FROM Rules AS parent INNER JOIN RecursiveCTETree AS leftchild ON (parent.LeftRule=leftchild.RuleId) Inner join RecursiveCTETree as rightchild ON (parent.RightRule=rightchild.RuleId))
Unfortunately, this solution doesn't work because the recursive member's FROM clause refers to the CTE twice. When defining a CTE's recursive member, one limitation is that the FROM clause can refer to the CTE only one time. There are also other limitations. For example, GROUP BY and HAVING clauses aren't allowed inside a recursive member. For a complete discussion of this topic, see the WITH common_table_expression (Transact-SQL) web page.
Exploring RPN
As you've seen, you can't use recursive CTE alone to solve the problem. You also need the second component: RPN. RPN is a mathematical notation in which every operator follows all of its operands. Also known as the postfix notation, it's parenthesis-free. For example, consider the infix expression:
p1 or ((p2 or p3) and p4) or p5
In RPN, it can be written like this:
p1 p2 p3 or p4 and or p5 or
One interesting feature of RPN is the ability to evaluate a Boolean expression through recursion of literal substitution. Consider the RPN expression 11or0or. The left substring 11or can be substituted using the lookup table shown in Table 1.
Left Substring |
---|
Table 1: Lookup Table |
11or |
10or |
01or |
00or |
After substitution, the string 11or0or will be replaced by 10or. Applying the lookup table again, 10or will be replaced by 1, which is the final evaluation of the whole initial expression.
Combining RPN and Recursive CTEs at ABC Solutions
To evaluate the business rules for each employee at ABC Solutions, you need to combine the RPN and recursive CTE components.
-- STEP 1WITH RecursiveCTETree(RuleId,StatementId,SequenceId) AS ( SELECT RuleId, StatementId, CAST(CAST(StatementId AS binary(4)) AS varbinary(1000)) AS SequenceId FROM -- Anchor containing root nodes ( SELECT r1.RuleId, r1.RuleId AS StatementId FROM dbo.Rules r1 LEFT OUTER JOIN dbo.Rules r2 ON ((r1.Ruleid=r2.RightRule) OR (r1.Ruleid=r2.LeftRule)) WHERE r2.RuleId IS NULL ) t UNION ALL -- First recursive member SELECT t1.LeftRule, StatementId, CAST(SequenceId+CAST(0x02 as binary(1)) AS varbinary(1000)) FROM RecursiveCTETree t0 INNER JOIN dbo.Rules t1 ON t0.RuleId=t1.RuleId WHERE t1.IsAtomic=0 UNION ALL -- Second recursive member SELECT t1.RightRule, StatementId, CAST(SequenceId+CAST(0x01 AS binary(1)) AS varbinary(1000)) FROM RecursiveCTETree t0 INNER JOIN dbo.Rules t1 ON t0.RuleId=t1.RuleId WHERE t1.IsAtomic=0 AND t1.RightRule IS NOT NULL)-- STEP 2,ExpressionElements as ( SELECT t1.StatementId, t1.Position, t0.* FROM -- Cross product of Rules and Employees ( SELECT r.RuleId, e.EmpId, CASE WHEN (IsAtomic=0) AND (Operator = 0) THEN 'n' WHEN (IsAtomic=0) AND (Operator = 1) THEN 'a' WHEN (IsAtomic=0) THEN 'o' WHEN (FieldSelector = 0) AND (Operator = 0) AND (Department = Expression) THEN '1' WHEN (FieldSelector = 0) AND (Operator = 0) THEN '0' WHEN (FieldSelector = 1) AND (Operator = 0) AND (Title = Expression) THEN '1' WHEN (FieldSelector = 1) AND (Operator = 0) THEN '0' WHEN (FieldSelector = 0) AND (Operator = 1) AND (Department like Expression) THEN '1' WHEN (FieldSelector = 0) AND (Operator = 1) THEN '0' WHEN (FieldSelector = 1) AND (Operator = 1) AND (Title like Expression) THEN '1' WHEN (FieldSelector = 1) AND (Operator = 1) THEN '0' END AS Value FROM dbo.Rules r CROSS JOIN dbo.Employee e ) t0 INNER JOIN( SELECT StatementId,RuleId ,RANK() OVER (PARTITION BY StatementId ORDER BY SequenceId desc) AS Position FROM RecursiveCTETree )t1 ON t0.RuleId=t1.RuleId)--STEP 3,RecursiveCTEEvaluation(StatementId,EmpId,Position,Value) AS( -- Anchor SELECT StatementId, EmpId, Position, CAST(Value AS varchar(max)) FROM ExpressionElements WHERE Position=1 UNION ALL -- Recursive member SELECT t0.StatementId, t0.EmpId, t1.Position, CASE WHEN t1.value='n' AND t0.Value='1' THEN '0' WHEN t1.value='n' AND t0.Value='0' THEN '1' WHEN t1.value='a' AND t0.Value LIKE '%00' THEN LEFT (t0.value,len(t0.value)-2)+'0' WHEN t1.value='a' AND t0.Value LIKE '%01' THEN LEFT (t0.value,len(t0.value)-2)+'0' WHEN t1.value='a' AND t0.Value LIKE '%10' THEN LEFT (t0.value,len(t0.value)-2)+'0' WHEN t1.value='a' AND t0.Value LIKE '%11' THEN LEFT (t0.value,len(t0.value)-2)+'1' WHEN t1.value='o' AND t0.Value LIKE '%00' THEN LEFT (t0.value,len(t0.value)-2)+'0' WHEN t1.value='o' AND t0.Value LIKE '%01' THEN LEFT (t0.value,len(t0.value)-2)+'1' WHEN t1.value='o' AND t0.Value LIKE '%10' THEN LEFT (t0.value,len(t0.value)-2)+'1' WHEN t1.value='o' AND t0.Value LIKE '%11' THEN LEFT (t0.value,len(t0.value)-2)+'1' ELSE t0.Value+t1.value END FROM RecursiveCTEEvaluation t0 INNER JOIN ExpressionElements t1 ON ((t0.Position+1=t1.Position) AND (t0.StatementId=t1.StatementId) AND (t0.EmpId=t1.EmpId)))SELECT Description AS [Rule], EmpId, ValueFROM( SELECT StatementId,EmpId,Value,Position,max(Position) OVER (PARTITION BY StatementId,Empid) AS pos FROM RecursiveCTEEvaluation) t0INNER JOIN dbo.Rules t1 ON t0.StatementId=t1.RuleIdWHERE Position=posORDER BY EmpId,StatementId
As Listing 4 shows, the solution involves three steps:
Define a recursive CTE that explores the expression tree contained in the Rules table. The recursive CTE will label each node with a number identifying the order of the node inside the expression rewritten in RPN.
Evaluate the atomic propositions for each employee. The resulting value (true or false) will replace the atomic propositions inside the RPN expression built in the previous step. For example, suppose you have the proposition p1p2and and two employees—p1,p2=0,0 for the first employee and p1,p2=1,1 for the second employee. The substitution process will duplicate the initial expression (p1p2and), after which it will replace the initial (i.e., original) expression with 00and and 11and.
Perform recursive literal substitution for all RPN expressions, as previously explained. Once again, a recursive CTE will be used to help reach the goal.
Let's look at each step in detail.
Examining Step 1
As previously mentioned, the Rules table stores the propositions. Suppose you want to populate the Rules table with two propositions. The first proposition is:
((Department = 'Dep1') or (Department like 'Dep2%')) and ((Title='t1') or (Title like 't2%'))
Expressed in RPN, it turns into:
(Department = 'Dep1') (Department like 'Dep2%') or (Title='t1') (Title like 't2%') or and
Listing 5 shows the code to populate the Rules table with this proposition.
-- First expression:-- ((Department = 'Dep1') or (Department like 'Dep2%'))-- and ((Title='t1') or (Title like 't2%'))-- Root AND node, RuleId=1INSERT [dbo].[Rules] ([RuleId], [Description], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (1, 'First Rule',0, NULL, 1, NULL, 2, 5)-- Logical OR, RuleId=2INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (2, 0, NULL, 2, NULL, 3, 4)-- (Department = 'Dep1'), RuleId=3INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (3, 1, 0, 0, N'Dep1', NULL, NULL)-- (Department like 'Dep2%'), RuleId=4INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (4, 1, 0, 1, N'Dep2%', NULL, NULL)-- Logical OR, RuleId=5INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (5, 0, NULL, 2, NULL, 6, 7)-- (Title='t1'), RuleId=6INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (6, 1, 1, 0, N't1', NULL, NULL)-- (Title like 't2%'), RuleId=7INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (7, 1, 1, 1, N't2%', NULL, NULL)-- Second expression:-- not ((Department = 'Dep1') or (Title='t2'))-- Root NOT node, RuleId=8INSERT [dbo].[Rules] ([RuleId], [Description], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (8,'Second rule', 0, NULL, 0, NULL, 9, NULL)-- Logical OR, RuleId=9INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (9, 0, NULL, 2, NULL, 10, 11)-- (Department = 'Dep1'), RuleId=10INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (10, 1, 0, 0, N'Dep1', NULL, NULL)-- (Title = 't2'), RuleId=11INSERT [dbo].[Rules] ([RuleId], [IsAtomic], [FieldSelector], [Operator], [Expression], [LeftRule], [RightRule]) VALUES (11, 1, 1, 0, N't2', NULL, NULL)
By reading the comment above each INSERT statement, it's possible to verify that the RPN expression (read from left to right) corresponds to the RuleId sequence of 3,4,2,6,7,5,1.
The second proposition is:
not((Department='Dep1') or (Title='t2'))
Expressed in RPN, it turns into:
(Department='Dep1') (Title='t2') or not
Listing 5 also includes this proposition. Once again, it's possible to verify that the RPN expression (read from left to right) corresponds to the RuleId sequence of 10,11,9,8.
To verify that step 1 is working properly, you can copy and paste the code for step 1 into the following query, then execute it:
-- Put code for step 1 hereSELECT StatementId,RuleIdFROM RecursiveCTETreeORDER BY StatementId, SequenceId DESC
You should get results like that in Figure 1.
Figure 1: Verifying the Results from Step 1
As you can see, StatementId discriminates the first expression from the second expression. For each StatementId, the order of RuleId matches the expected RPN order.
How does the code in step 1 work? The CTE's anchor query individuates the root nodes of all the propositions contained in the Rules table and labels each root node with a distinct StatementId. The CTE also contains two recursive members. The first member joins all left children to the base node set. The second member joins all right children to the resulting union of the base set with the first member. The resulting union (which consists of the root nodes, left children, and right children) is the new base set for the next recursion step. Please note that the SequenceId field is calculated in a cumulative way through the recursion steps. For root nodes, the SequenceId value is equal to the StatementId value. For the left child nodes in the first recursive member, the SequenceId value is equal to the concatenation of the parent value with 0x02. For the right child nodes in the second recursive member, the SequenceId value is equal to the concatenation of the parent value and 0x01. Applying these rules, the SequenceId values are calculated for the nodes in the first proposition. They are:
RuleId=1 is SequenceId=0x01
RuleId=2 is SequenceId=0x0102
RuleId=3 is SequenceId=0x010202
RuleId=4 is SequenceId=0x010201
RuleId=5 is SequenceId=0x0101
RuleId=6 is SequenceId=0x010102
RuleId=7 is SequenceId=0x010101
If you put the nodes in descending SequenceId order, the RuleId order is 3,4,2,6,7,5,1—the same as the RPN order.
Examining Step 2
As Listing 4 shows, step 2 starts by evaluating the cross product of the Rules and Employees tables. For each product couple (r,e), the statement evaluates the Boolean value of the atomic proposition, with key r applied to employee e. The evaluation is stored in the Value field. However, if r is associated to a complex proposition, the Value field is left null.
To reorder the propositions using RPN, you need to find, for each proposition r calculated during the cross product evaluation, the order of r inside the statement found at step 1. This is obtained through the join with RecursiveCTETree.
To verify the results of step 2, let's populate the Employee table by running the code:
INSERT [dbo].[Employee] ([EmpId], [Department], [Title]) VALUES (1, N'Dep1', N't1')INSERT [dbo].[Employee] ([EmpId], [Department], [Title]) VALUES (2, N'Dep3', N't3')
Next, copy and paste the code for step 1 and step 2 into the following query, then execute it:
-- Put code for step 1 here-- Put code for step 2 hereSELECT EmpId,StatementId,Position,ValueFROM ExpressionElementsORDER BY EmpId,StatementId,Position
You should get results like that in Figure 2.
Figure 2: Verifying the Results from Step 2
Please note that concatenation of last field partitioned by EmpId,StatementId is: 10o10oa, 10on, 00o00oa, 00on. These strings are RPN expressions ready for literal substitution.
Examining Step 3
Literal substitution is executed through a new recursive CTE. The anchor selects the first element of each sequence partitioned by EmpId,StatementId (as seen in step 2). The recursive member finds the next element of each sequence. If the element is a Boolean value (1 or 0), it's simply concatenated to the value of the previous step. If the element is an operator (o, a, or n), the Boolean values concatenated by the previous steps are replaced with the result of the Boolean operator evaluation.
After the recursive CTE cleans the results from the intermediate steps, the last statement presents the final results. Figure 3 shows those results.
Figure 3: Examining the Final Results
A Powerful Combination
As you've seen, it's possible to evaluate Boolean propositions in a single T-SQL query by combining recursive CTE and Reverse Polish notation. Although I showed you how to use this technique to evaluate business rules for employees, you can generalize and apply it to other situations in which a complex parenthesized expression needs to be evaluated.
*************************************************************************************
Tommaso Capasso is an MCSA: SQL Server 2012 certified professional. He currently works as a consultant for Avanade, a joint venture of Microsoft and Accenture. He specializes in Microsoft technologies but in general likes problem solving and algorithms.
Email: [email protected]
About the Author
You May Also Like