Logical Query Processing Part 5: The FROM Clause and UNPIVOTLogical Query Processing Part 5: The FROM Clause and UNPIVOT
This is Part 5 of a series. Part 1 provided a high-level overview. Part 2, Part 3 and Part 4 covered the FROM clause table operators JOIN, APPLY and PIVOT, respectively. In this article I cover the fourth and last FROM clause table operator—UNPIVOT.
May 11, 2016
This article continues the series covering logical query processing, which describes the conceptual interpretation of a query. Part 1 provided a high level overview. Part 2, Part 3 and Part 4 covered the FROM clause table operators JOIN, APPLY and PIVOT, respectively. In this article I cover the fourth and last FROM clause table operator—UNPIVOT.
For sample data I’ll use the same sample database TSQLV4 that I used in the previous parts. If you don’t have it installed already you can download the code that creates and populates it from here. Make sure to set your database context to this database before running the code samples in this article:
USE TSQLV4;
Besides describing the logical processing aspects of the UNPIVOT operator, I’ll also provide an alternative to unpivoting data using the APPLY operator. I’ll also provide an example for combining the UNPIVOT and PIVOT operator, and finally, and explain how you can dynamically unpivot data.
Unoivoting data
While pivoting data involves rotating data from a state of rows to columns, unpivoting rotates data from a state of columns to rows. As an example, run the following code to create a table called MyPivotedOrders and fill it with sample data:
IF OBJECT_ID(N'Sales.MyPivotedOrders', N'U') IS NOT NULL DROP TABLE Sales.MyPivotedOrders;-- In SQL Server 2016 use: DROP TABLE IF EXISTS Sales.MyPivotedOrders;GOWITH C AS( SELECT empid, YEAR(orderdate) AS orderyear, val FROM Sales.OrderValues WHERE custid = 5) SELECT *INTO Sales.MyPivotedOrdersFROM C PIVOT( SUM(val) FOR orderyear IN ([2014], [2015], [2016]) ) AS P;ALTER TABLE Sales.MyPivotedOrders ADD CONSTRAINT PK_MyPivotedOrders PRIMARY KEY(empid);SELECT *FROM Sales.MyPivotedOrders;
Table 1 shows the contents of MyPivotedOrders.
Table 1: Contents of Sales.MyPivotedOrders table
empid 2014 2015 2016------ -------- -------- --------1 NULL 6627.75 NULL2 613.20 NULL NULL3 2222.40 2804.19 1835.704 NULL NULL 709.555 NULL 601.83 1096.208 1488.80 NULL 2048.219 NULL 3815.25 1064.50
Unpivoting involves taking any number of source columns that is greater than one, and transforming them into two target columns—one that will hold the source column values (the values column) and another that will hold the source column names (the names column).
In this particular example the source has a row per employee and a column per order year, and each intersection of row and column holds the total order values for the current employee and year. It’s not convenient to perform data manipulation against the current state of the data. You want to unpivot it into a state that has a separate row per employee and year. Table 2 shows the desired results.
Table 2: Desired result for unpivoting task
empid orderyear val ------ ---------- --------1 2015 6627.75 2 2014 613.20 3 2014 2222.40 3 2015 2804.19 3 2016 1835.70 4 2016 709.55 5 2015 601.83 5 2016 1096.20 8 2014 1488.80 8 2016 2048.21 9 2015 3815.25 9 2016 1064.50
Similar to the PIVOT operator, T-SQL supports an operator called UNPIVOT, which you also use in the FROM clause of the query. Just like PIVOT, the UNPIVOT operator applies a number of logical steps. I’ll number the steps 1-UN, where 1 represents the fact that the FROM clause is the first major clause that is logically processed, U stands for UNPIVOT and N is the step number within the UNPIVOT operator.
The UNPIVOT operator applies the following three logical steps:
1-U1. Generate copies
1-U2. Extract element
1-U3. Remove NULLs
The following sections describe these steps and demonstrate them for our sample task.
1-U1. Generate copies
The first step in unpivoting data is to generate a separate copy of each source row for each unpivoted column. Our unpivoted columns represent the order years: 2014, 2015 and 2016, therefore this step generates a copy for each year. Table 3 shows the result of this step for our sample task.
Table 3: Result of step 1-U1
empid orderyear 2014 2015 2016 ------ ---------- -------- -------- --------1 2014 NULL 6627.75 NULL 1 2015 NULL 6627.75 NULL 1 2016 NULL 6627.75 NULL 2 2014 613.20 NULL NULL 2 2015 613.20 NULL NULL 2 2016 613.20 NULL NULL 3 2014 2222.40 2804.19 1835.70 3 2015 2222.40 2804.19 1835.70 3 2016 2222.40 2804.19 1835.70 4 2014 NULL NULL 709.55 4 2015 NULL NULL 709.55 4 2016 NULL NULL 709.55 5 2014 NULL 601.83 1096.20 5 2015 NULL 601.83 1096.20 5 2016 NULL 601.83 1096.20 8 2014 1488.80 NULL 2048.21 8 2015 1488.80 NULL 2048.21 8 2016 1488.80 NULL 2048.21 9 2014 NULL 3815.25 1064.50 9 2015 NULL 3815.25 1064.50 9 2016 NULL 3815.25 1064.50
As mentioned earlier, unpivoting transform or source columns into two target columns—the values column and the names column. As you can see, the first step already creates the names column (orderyear in our example).
1-U2. Extract element
The second step essentially creates the values column (which we will name val in our example). It extracts the element from the source columns that corresponds to the name that the current copy represents. In our example, it extracts the value from the source year column that corresponds to the current orderyear value. When orderyear is 2014, val returns the value from the column [2014], when orderyear is 2015, val returns the value from [2015], and when orderyear is 2016, val returns the value from [2016]. Table 4 shows the desired result of this step for our sample task.
Table 4: Result of step 1-U2
empid orderyear val ------ ---------- --------1 2014 NULL 1 2015 6627.75 1 2016 NULL 2 2014 613.20 2 2015 NULL 2 2016 NULL 3 2014 2222.40 3 2015 2804.19 3 2016 1835.70 4 2014 NULL 4 2015 NULL 4 2016 709.55 5 2014 NULL 5 2015 601.83 5 2016 1096.20 8 2014 1488.80 8 2015 NULL 8 2016 2048.21 9 2014 NULL 9 2015 3815.25 9 2016 1064.50
1-U3. Remove NULLs
The third and last step in unpivoting is to remove the rows with the NULLs in the result values column. You typically don’t want to keep those rows since they represent inapplicable cases. In our example those are cases where that employee didn’t have any activity in that year.
The output of this step in our example was shown earlier in Table 2 as the desired output of the unpivoting task.
The UNPIVOT Operator
As mentioned, the UNPIVOT operator is designed very similar to the PIVOT operator as a step in the FROM clause. Its syntax requires you to identify the three elements mentioned earlier that are involved in the unpivoting task:
1. Target name for column that will hold source column values (val in our case)
2. Target name for column that will hold source column names (orderyear in our case)
3. Names of source columns ([2014],[2015],[2016] in our case)
Once identified, you specify those elements in the UNPIVOT operator using the following syntax:
SELECT *FROM UNPIVOT( 1 FOR 2 IN (3) ) AS ;
Applied to our example, here’s the complete solution query with the UNPIVOT operator:
SELECT empid, orderyear, val
FROM Sales.MyPivotedOrders
UNPIVOT( val FOR orderyear IN ([2014],[2015],[2016]) ) AS U;
Figure 1 shows how the elements that are involved in the UNPIVOT operator correspond to both the operator’s input and output.
Figure 1: UNPIVOT operator
Figure 01 - UNPIVOT operator
Note that the names column (orderyear) is typed as NVARCHAR(128). If you need a different type like INT, cast it.
Figure 2 provides the flow chart describing the logical query processing of the FROM clause, with all four table operators, including the UNPIVOT operator.
Figure 2: Logical query processing flow chart - the FROM clause
Figure 02 - Logical query processing flow chart - the FROM clause_0
Alternative solution using APPLY
The UNPIVOT operator is designed very elegantly and is very concise, but it has its disadvantages. For one, it doesn’t support (at least in a simple way) unpivoting multiple sets of source columns into multiple target values columns. Imagine for example that you have one set of columns representing order values and another representing order quantities. For another, what if you’re not interested in the removal of the rows with the NULLs in the values column? Following is an alternative solution to unpivoting using the APPLY operator, handling our example task:
SELECT empid, orderyear, valFROM Sales.MyPivotedOrders CROSS APPLY ( VALUES(2014, [2014]), (2015, [2015]), (2016, [2016]) ) AS A(orderyear, val)WHERE val IS NOT NULL;
The solution uses the CROSS APPLY operator with a table value constructor to define three rows for the three order years, to achieve the effect of generating copies (step 1).
The second column in each of the three rows extracts the element from the source year column that corresponds to the current order year (step 2). You could achieve the first step with a CROSS JOIN, but not the second. Recall that a join perceives its two inputs as a set, and therefore you cannot refer from one side to the elements of the other. Conversely, CROSS APPLY evaluates the inputs from left to right and therefore allows the right input to refer to elements from the left input. Here, to support multiple measures (e.g., val and qty), simply specify more columns in each row.
Finally, since APPLY generated the values column alias (val in our case) in the FROM clause, you can refer to that alias in the WHERE clause if you wish to remove the rows with the NULLs. Here, clearly, the removal of the NULLs is optional.
To demonstrate using this technique to unpivot multiple sets of columns, I’ll use a table called MyPivotedOrders2, which you create and populate by running the following code:
IF OBJECT_ID(N'Sales.MyPivotedOrders2', N'U') IS NOT NULL DROP TABLE Sales.MyPivotedOrders2;-- In SQL Server 2016 use: DROP TABLE IF EXISTS Sales.MyPivotedOrders2;GOSELECT empid, SUM(CASE WHEN orderyear = 2014 THEN val END) AS val2014, SUM(CASE WHEN orderyear = 2015 THEN val END) AS val2015, SUM(CASE WHEN orderyear = 2016 THEN val END) AS val2016, SUM(CASE WHEN orderyear = 2014 THEN qty END) AS qty2014, SUM(CASE WHEN orderyear = 2015 THEN qty END) AS qty2015, SUM(CASE WHEN orderyear = 2016 THEN qty END) AS qty2016INTO Sales.MyPivotedOrders2FROM Sales.OrderValues CROSS APPLY ( VALUES(YEAR(orderdate)) ) AS A(orderyear)WHERE custid = 5GROUP BY empid;ALTER TABLE Sales.MyPivotedOrders2 ADD CONSTRAINT PK_MyPivotedOrders2 PRIMARY KEY(empid);SELECT *FROM Sales.MyPivotedOrders2;
Table 5 shows the contents of MyPivotedOrders2.
Table 5: Contents of Sales.MyPivotedOrders2 table
empid val2014 val2015 val2016 qty2014 qty2015 qty2016------ -------- -------- -------- -------- -------- --------1 NULL 6627.75 NULL NULL 235 NULL2 613.20 NULL NULL 62 NULL NULL3 2222.40 2804.19 1835.70 43 181 564 NULL NULL 709.55 NULL NULL 615 NULL 601.83 1096.20 NULL 38 678 1488.80 NULL 2048.21 64 NULL 759 NULL 3815.25 1064.50 NULL 27 92
You want to create two values columns—from val2014, val2015, val2016 you want to create a target column called val, and from qty2014, qty2015, qty2016 a target column called qty column. As mentioned, for each target measure, simply add in each row in the table value constructor, the column that corresponds to that measure, like so:
SELECT empid, orderyear, val, qtyFROM Sales.MyPivotedOrders2 CROSS APPLY ( VALUES(2014, [val2014], [qty2014]), (2015, [val2015], [qty2015]), (2016, [val2016], [qty2016]) ) AS A(orderyear, val, qty)WHERE val IS NOT NULL OR qty IS NOT NULL;
This query generates the output shown in Table 6.
Table 6: Result of unpivoting with multiple measures
empid orderyear val qty------ ---------- -------- ----1 2015 6627.75 2352 2014 613.20 623 2014 2222.40 433 2015 2804.19 1813 2016 1835.70 564 2016 709.55 615 2015 601.83 385 2016 1096.20 678 2014 1488.80 648 2016 2048.21 759 2015 3815.25 279 2016 1064.50 92
Combining operators
Remember that you can combine table operators within the FROM clause. From a logical query processing standpoint, they’re evaluated from left to right in written order. This means that one operator’s result because the input to the next. For example, suppose that you wanted to transpose the data from the Sales.MyPivotedOrders table, and instead of showing employees on rows and order years on columns, you want to return order years on rows and employee IDs on columns. Table 7 shows the desired output.
Table 7: Desired result for transpose task
orderyear 1 2 3 4 5 8 9---------- -------- -------- -------- -------- -------- -------- --------2014 NULL 613.20 2222.40 NULL NULL 1488.80 NULL2015 6627.75 NULL 2804.19 NULL 601.83 NULL 3815.252016 NULL NULL 1835.70 709.55 1096.20 2048.21 1064.50
To achieve this, you first unpivot the data to create a result with a row per employee and order year, and then pivot the result to create a row per order year and a column per employee, like so:
SELECT *FROM Sales.MyPivotedOrders UNPIVOT( val FOR orderyear IN ([2014],[2015],[2016]) ) AS U PIVOT( MAX(val) FOR empid IN([1], [2], [3], [4], [5], [8], [9]) ) AS P;
Dynamic UNPIVOT
Last month I showed how to use a technique based on the FOR XML option to construct a PIVOT query and execute it dynamically to avoid needing to hard code the spreading values. In a very similar way you can handle a dynamic unpivoting task. In our example with the N'Sales.MyPivotedOrders table, you can obtain the names of the columns that need to be unpivoted by querying the sys.columns view. You can select all columns besides the empid column. The rest is pretty much the same as what I showed in the dynamic pivoting task. You use the FOR XML technique to create the comma separated list of column names that should appear in the UNPIVOT operator’s IN clause, and concatenate the static parts of the query before and after the list of columns. Finally, you use sp_executesql to execute the code. Here’s the complete solution code:
DECLARE @cols AS NVARCHAR(1000), @sql AS NVARCHAR(4000);-- Construct the column list for the IN clauseSET @cols = STUFF( (SELECT N','+ QUOTENAME(name) AS [text()] FROM sys.columns WHERE object_id = OBJECT_ID(N'Sales.MyPivotedOrders', N'U') AND name NOT IN(N'empid') ORDER BY name FOR XML PATH(''), TYPE).value('.[1]','NVARCHAR(MAX)'), 1, 1, N'');-- Construct the full T-SQL statement-- and execute dynamicallySET @sql = N'SELECT empid, orderyear, valFROM Sales.MyPivotedOrders UNPIVOT(val FOR orderyear IN(' + @cols + N')) AS U;';EXEC sys.sp_executesql @stmt = @sql;
What’s next?
In this article I focused on the logical query processing of the UNPIVOT table operator. I described the three steps involved in the operator: generate copies, extract element and remove NULLs. I provided the syntax of the operator and tips about how to remember the elements involved in that syntax. I then explained the disadvantages of the operator and provided an alternative solution using the APPLY operator. I also demonstrated combining PIVOT and UNPIVOT operators, and finally showed how to handle unpivoting dynamically. This article concludes the coverage of the logical query processing aspects of the first major querying clause—the FROM clause. Next month I’ll move on to the second major clause—the WHERE clause.
About the Author
You May Also Like