Create Pivoted Tables in 3 Steps
Many people find the PIVOT operator syntax in SQL Server Books Online (BOL) hard to understand. The syntax in SQL Server Books Online uses a derived table as the basis for the PIVOT query. What SQL Server Books Online doesn't point out is that you can use a common table expression (CTE) instead. Kathi Kellenberger walks you through creating a PIVOT query that uses a common table expression.
June 22, 2009
The PIVOT operator, which was introduced in SQL Server 2005, lets you create results that are pivoted, essentially using the data from one of the columns as column headers. For example, suppose you want to create a report that breaks down sales by year and month so that you can compare sales months for different years. Using the 2005 or 2008 version of the AdventureWorks database, you can create a query summarizing the data with the code in Listing 1. Table 1 shows an excerpt from the results. As you can see, looking for trends by month isn't easy.
SELECT SUM(TotalDue) TotalDue, YEAR(OrderDate) AS YearOrdered, MONTH(OrderDate) AS MonthOrderedFROM Sales.SalesOrderHeaderGROUP BY YEAR(OrderDate),MONTH(OrderDate)ORDER BY YEAR(OrderDate),MONTH(OrderDate)
TotalDue |
---|
Table 1: Partial Results from the Query in Listing 1 |
1172359.4289 |
2605514.9809 |
2073058.5385 |
... |
3781879.0708 |
... |
Table 2 shows the results as you would like to see them. In Table 2, the data is summarized and displayed so that the months can be easily compared from year to year. I'll explain how to write queries that use the PIVOT operator to produce the results shown in Table 2.
Year |
---|
Table 2: The Pivoted Results |
2001 |
2002 |
2003 |
2004 |
Note that I won't be using the PIVOT syntax shown in SQL Server Books Online (BOL) because that syntax can be difficult to understand at first glance. The syntax in BOL uses a derived table as the basis for the PIVOT query. What BOL doesn't point out is that you can use a common table expression (CTE) instead. Listing 2 shows the syntax for a PIVOT query that uses a CTE. As you can see, there are two main parts: a base query (callout A) and a PIVOT expression (callout B). This syntax might look intimidating, but I'll guide you through it step by step.
BEGIN CALLOUT A-- The base queryWITH AS ( )SELECT , [] AS , [] AS , ... [] AS FROM END CALLOUT ABEGIN CALLOUT B-- The PIVOT expression( ()FOR[] IN ( [], [], ... [])) AS END CALLOUT B
Step 1. Write the Base Query
Writing the base query takes a bit of planning. It's important that this query include only the columns that will be needed in the final results. Any columns not pivoted or aggregated will end up as groupings, so any unnecessary columns will cause extraneous grouping levels and unexpected results.
In this case, the Sales.SalesOrderHeader table has more columns than what is needed in the results. All you need is the OrderYear, OrderMonth, and TotalDue columns. You're going to group by the OrderYear column, pivot by the OrderMonth column, and aggregate the TotalDue column, so the base query is
SELECT TotalDue, YEAR(OrderDate) AS OrderYear, DATENAME(MONTH,OrderDate) AS MonthNameFROM Sales.SalesOrderHeader
The code in Listing 3 shows this base query as a CTE. You should execute the CTE to make sure that the necessary columns are present and that there aren't any extraneous columns. (The results will not be aggregated or pivoted at this point.)
WITH BaseQuery AS( SELECT TotalDue, YEAR(OrderDate) AS OrderYear, DATENAME(MONTH,OrderDate) AS OrderMonth FROM Sales.SalesOrderHeader)SELECT * FROM BaseQuery
Step 2. Create the PIVOT Expression
The next step is to create the PIVOT expression. The first element in the PIVOT expression is an aggregate function. Often this function will be SUM. The parameter of the aggregate function is the name of the column to be aggregated. The function's results will show up under the pivoted columns. In this example, you want to compare the sum of the TotalDue values by month, so TotalDue is the aggregated column. The PIVOT expression with the aggregate function is then
PIVOT(SUM(TotalDue)
After the aggregate function, you must type the keyword FOR followed by the name of the pivoted column. To determine the pivoted column, you need to figure out which column contains the values that you want displayed as column headers. In other words, which values that are currently displayed vertically do you want to display horizontally? In this example, the pivoted column is OrderMonth, so the code looks like
PIVOT(SUM(TotalDue) FOR OrderMonth
The pivoted column's name is followed by an IN list that's similar to one found in a WHERE clause. This IN list serves two purposes. First, it restricts the rows that are pivoted. Second, it supplies the pivoted column names. If the values that will end up as column names don't follow the rules for regular identifiers, they must be surrounded by brackets ([ ]). For example, if this example used month numbers instead of month names, you'd need to place each month number inside brackets.
One limitation of PIVOT queries is that they aren't dynamic, so all the column headers need to be hard-coded. If the pivoted column has values that vary over time, this part of the expression must be modified each time the data in the pivoted column changes. In this example, the column headers aren't likely to change because they're the months of the year. However, if a query compared sales by territories or sales by salespeople, the query would probably have to be modified frequently. Another option would be to write a stored procedure using dynamic SQL to determine the column headings. If you're interested in learning how to create a dynamic pivot query, see Itzik Ben-Gan's web-exclusive article "Dynamic Pivoting" or Inside Microsoft SQL Server 2005 T-SQL Programming (Microsoft Press, 2006) by Ben-Gan, Dejan Sarka, and Roger Wolter.
The IN list needs to be enclosed in parentheses so the PIVOT expression now looks like
PIVOT(SUM(TotalDue) FOR OrderMonth IN(January,February,March,April,May, June,July,August,September,October, November,December)
Next, you must give an alias to the PIVOT expression. An alias is required because the PIVOT function's results are treated as a table. The alias goes after the final closing parenthesis, so in this case, the PIVOT expression looks like
PIVOT(SUM(TotalDue) FOR OrderMonth IN(January,February,March,April,May, June,July,August,September,October, November,December)) AS PVT
The PIVOT expression is now complete. As callout B in Listing 2 shows, it goes after the FROM clause and before the ORDER BY clause if there is one.
Step 3. Add the Column Names to the SELECT List
At this point, you need to add the column names to the SELECT list in the main query. In this case, the columns are OrderYear and the pivoted columns. You should not list the aggregated column, TotalDue, so the SELECT list looks like
SELECT OrderYear,January,February, March,April,May,June,July,August, September,October,November,December
The pivoted columns will display in the order that they're listed in the SELECT clause. Listing 4 shows the completed PIVOT query, which created the pivoted results in Table 2.
WITH BaseQuery AS( SELECT TotalDue, YEAR(OrderDate) AS OrderYear, DATENAME(MONTH,OrderDate) AS OrderMonth FROM Sales.SalesOrderHeader)SELECT OrderYear,January,February,March,April,May, June,July,August,September,October,November,DecemberFROM BaseQueryPIVOT(SUM(TotalDue) FOR OrderMonth IN (January,February,March, April, May,June,July,August,September,October,November, December)) AS PVT
Variations
There are many ways the PIVOT query can vary. Take, for example, the PIVOT query in Listing 5.
WITH BaseQuery AS( SELECT TotalDue, YEAR(OrderDate) AS OrderYear, MONTH(OrderDate) AS OrderMonth FROM Sales.SalesOrderHeader)SELECT OrderYear,[1] AS [January],[2] AS [February], [3] AS [March],[4] AS [April],[5] AS [May],[6] AS [June], [7] AS [July], [8] AS [August],[9] AS [September], [10] AS [October], [11] AS [November],[12] AS [December]FROM BaseQueryPIVOT(SUM(TotalDue) FOR OrderMonth IN ([1],[2],[3],[4],[5], [6],[7],[8],[9],[10],[11],[12])) AS PVTORDER BY January DESC
This query varies from the one in Listing 4 two ways:
It uses aliased column names. In this query, the numeric month of the order date is used in the base query. The pivoted column names are aliased in the SELECT list so that the column headers are month names rather than month numbers. To save typing, it's advantageous to produce the desired column headings in the base query, which in this case is the CTE.
It uses an ORDER BY clause to order the returned data. The ORDER BY clause needs to go after the PIVOT expression. The only columns allowed in the ORDER BY clause are those that actually show up as columns in the results. You can include the columns used for grouping and those specified in the IN list. For example, this query uses ORDER BY January DESC to display the results in order of highest to lowest sales in January. If you want to display the year in descending order, you would put ORDER BY OrderYear DESC after the PIVOT expression.
A Useful Tool
Although the PIVOT operator might look intimidating, writing a PIVOT query isn't that difficult if you take it step by step. The PIVOT operator is perfect for pivoting results when the pivoted columns aren't likely to change. Although it's disappointing that the PIVOT operator isn't dynamic, the PIVOT operator is still a very useful tool. (A dynamic PIVOT operator is on my wish list for the next release of SQL Server since the feature didn’t make it into SQL Server 2008.)
About the Author
You May Also Like