PIVOT on Steroids
Encapsulate PIVOT workarounds into a dynamic, flexible procedure
December 17, 2006
SQL Server 2005’s new PIVOT operator simplifies the process of rotating data from rows to columns for flexible analysis. However, as I described last month, PIVOT has some limitations (see “Enhancing PIVOT for Crosstab Queries”). Not only are the grouping columns implicit, increasing the potential for writing code that produces undesired results, but the PIVOT IN list (which contains rotation elements) isn’t dynamic. In addition, grouping, rotation, and aggregation columns must be base columns in the table you provide as input, and you can’t rotate more than one column or specify more than one aggregation.
Last month, I showed you how to circumvent most of these limitations by providing a table expression (a derived table or common table expression—CTE) to the PIVOT operator as input and “preparing” columns that result from expressions. I also explained how to use dynamic SQL to make the PIVOT column dynamic.
This month, I show you how to encapsulate all this workaround logic into a stored procedure to provide a flexible and more complete solution for dynamic pivoting needs. Because this solution uses dynamic SQL, which can expose your system to security risks, I also share tips for helping you safely use the procedure. The sp_pivot and usp_pivot stored procedures that I describe are from Inside Microsoft SQL Server 2005: T-SQL Programming (Microsoft Press, 2006), which I wrote with Dejan Sarka and Roger Wolter.
Powerful Pivoting Logic
The sp_pivot stored procedure delivers powerful, dynamic pivoting logic. Run the code in Listing 1, to create sp_pivot in the master database. As I noted, the stored procedure uses dynamic SQL, which opens the door to serious SQL injection risks. (If you aren’t familiar with SQL injection, I urge you to read the “SQL Injection” section in SQL Server 2005 Books Online—BOL.) After I explain sp_pivot’s pivoting logic, I’ll cover measures you can take to mitigate these security risks.
Before I dig into sp_pivot’s interface and implementation, I want to discuss the stored procedure’s name and the fact that I created the stored procedure in the master database. A stored procedure whose name you prefix with the characters sp_ and that you create in the master database has special behavior. You can invoke the procedure while connected to any database without having to database-qualify the procedure name:
USE Northwind;EXEC dbo.sp_pivot ;
Thus, the execution context of dynamic SQL isn’t that of the master database but rather the context of your connection (Northwind, in this case). Furthermore, you can be connected to any database and force the context of the procedure’s execution (in terms of dynamic SQL) to be whatever database you qualify the procedure name with. For example, the following statement forces the execution context of dynamic SQL to that of the Northwind database, regardless of which database you’re connected to:
EXEC Northwind.dbo.sp_pivot ;
Note that Microsoft doesn’t recommend creating stored procedures in the master database using the sp_ prefix and doesn’t guarantee any support if you decide to do so. To adhere to Microsoft’s recommendations, you might prefer to create the stored procedure in all user databases in which you might need it and use a different prefix in its name. This approach, of course, would require you to maintain multiple copies of the stored procedure. Now, let’s see how I implemented sp_pivot.
5 Parameters
The sp_pivot stored procedure accepts five input parameters: @query, @on_rows, @on_cols, @agg_func, and @agg_col. The @query parameter represents the input table to the PIVOT operator. The parameter can be a table or view name or even a query that produces a table result. The stored procedure’s code determines whether @query contains a valid table or view name and, if so, constructs a SELECT query against the object:
SET @query = N’SELECT * FROM ‘+ @query;
If @query doesn’t contain a valid table or view name, the stored procedure assumes it contains a query to begin with.
The @on_rows parameter represents the grouping columns. The stored procedure requires you to specify the grouping columns explicitly, unlike the PIVOT operator. You can specify one or more columns separated by commas, or you can specify expressions such as
@on_rows = N’EmployeeID, MONTH(OrderDate) AS OrderMonth’
The @on_cols parameter represents the rotation column. Here, you can specify a column name or expression; you don’t specify the actual rotation elements. The stored procedure uses dynamic SQL to determine the rotation elements. If you want to rotate order years, for example, you can specify the expression
@on_cols = N’YEAR(OrderDate)’
If you want to rotate multiple columns, you can provide an expression that concatenates elements. So, if you want to rotate shippers and order years, you can specify the expression
@on_cols = N’CAST(ShipVia AS VARCHAR(10)) + ‘’_’’ + CAST(YEAR(OrderDate) AS VARCHAR(10))’
The @agg_func parameter represents the aggregate function you want to use (e.g., SUM, MIN, MAX, COUNT):
@agg_func = N’SUM’
You can even specify a user-defined aggregate (UDA) function name.
Last, the @agg_col parameter represents the aggregation column that you’ll provide to the aggregate function as input. You can specify a column name, an expression (e.g., N'Quantity * UnitPrice'), or even N'*' if you want to count rows. Note that sp_pivot doesn’t work around PIVOT’s inability to calculate multiple aggregations; you’re still limited to one aggregation.
You’re now ready to use the sp_pivot stored procedure to provide flexible, dynamic pivoting functionality. Let’s first look at a few examples of using the procedure and then walk through its implementation. In the following example, sp_pivot returns the sum of freight for each employee and shipper, generating the results that Table 1 shows:
EXEC Northwind.dbo.sp_pivot @query = N’dbo.Orders’, @on_rows = N’EmployeeID’, @on_cols = N’ShipVia’, @agg_func = N’SUM’, @agg_col = N’Freight’;
Table 2 shows the results of the next example, which returns the count of orders for each employee and shipper:
EXEC Northwind.dbo.sp_pivot @query = N’dbo.Orders’, @on_rows = N’EmployeeID’, @on_cols = N’ShipVia’, @agg_func = N’COUNT’,@agg_col = N’*’;
Table 3 shows the results of running the third sp_pivot example code, which returns the sum of value (Quantity * UnitPrice) for each month and year:
EXEC Northwind.dbo.sp_pivot @query = N’SELECT OrderDate, Quantity, UnitPriceFROM dbo.Orders AS OJOIN dbo.[Order Details] AS OD ON O.OrderID = OD.OrderID’, @on_rows = N’MONTH(OrderDate) AS OrderMonth’, @on_cols = N’YEAR(OrderDate)’, @agg_func = N’SUM’, @agg_col =N’Quantity * UnitPrice’;
And Web Table 1 (which you can view at http://www.sqlmag.com, InstantDoc ID 94268) shows the results of the last example, which returns the sum of freight for each employee and shipper_year:
EXEC Northwind.dbo.sp_pivot @query = N’dbo.Orders’, @on_rows = N’EmployeeID’, @on_cols = N’CAST(ShipVia AS VARCHAR(10)) + ‘’_’’ + CAST(YEAR(OrderDate) AS VARCHAR(10))’, @agg_func = N’SUM’,@agg_col = N’Freight’;
Now that you’ve seen how you can use sp_pivot to rotate data from rows to columns, let’s look at how the stored procedure encapsulates the logic of the PIVOT workaround techniques I covered last month. At callout A in Listing 1, the code determines whether the input is a valid table or view name, as I mentioned earlier. If it is, the code constructs a SELECT query against the table or view; if it isn’t, the stored procedure assumes you provided a query to begin with. The code then creates a derived table called Query out of the query string stored in @query.
At callout B, the code circumvents the limitation that * isn’t allowed as input to the COUNT function by substituting * with the constant 1. And at callout C, the procedure uses dynamic SQL to query the list of rotation elements and store the list in the @cols local variable. The code constructs a FOR XML PATH query string that queries the distinct rotation elements (expressed by @on_cols) from the derived table stored in @query, concatenates them, and returns the concatenated string by using the @result output parameter. You can use other techniques besides FOR XML PATH to concatenate strings, such as using a cursor, but those approaches require more code and are much slower.
At callout D, the procedure constructs the PIVOT query and invokes it using dynamic SQL. Note two important points here. First, the derived table PivotInput contains only the columns that need to be involved in the pivot operation: grouping columns (@on_rows), the rotation column (@on_cols), and the aggregation column (@agg_col). This technique lets you prevent implicit grouping by columns that you don’t want to take part in the grouping activity. Second, the list of rotation elements stored in @cols that the procedure constructed dynamically is part of the concatenation (in the parentheses following the IN clause), letting you rotate a dynamic number of elements that are unknown ahead of time. In the query string you’re constructing, the IN list appears to be a list of known elements.
Mitigating SQL Injection Risks
As I’ve noted, sp_pivot’s implementation doesn’t guard against SQL injection attacks. Coverage of SQL injection is outside the scope of this article, but you can learn about SQL injection and how to prevent it by reading “SQL Injection: The Hacker’s Gold Mine” (May 2006) and “Preventing SQL Injection Attack” (August 2004). The following example shows how easy it is to inject code by using the sp_pivot stored procedure:
EXEC Northwind.dbo.sp_pivot @query = N’dbo.Orders’, @on_rows = N’1 AS c) AS D;PRINT ‘’SQL Injection!This could have been much worse than a PRINT command!’’;SELECT * FROM (selectEmployeeID’,@on_cols = N’ShipVia’,@agg_func = N’COUNT’,@agg_col = N’*’;
This invocation of sp_pivot injects a PRINT statement, but you could inject much more destructive code, such as DROP commands and xp_cmdshell.
You can mitigate the risks of SQL injection, but bear in mind that hackers continue to find creative ways to inject code into your systems. If you use dynamic SQL, your operating premise should be that hackers will be able to inject their own code through it. The only way to prevent injection is to not use dynamic SQL. With that in mind, here are a few ways you can mitigate the risks of SQL injection in sp_pivot:
Limit the size of your input parameters. Sp_ pivot defines all parameters as NVARCHAR (MAX), giving hackers a lot of leeway for injecting code. You can define all pivoting elements (grouping column, rotation column, aggregation column) as sysname, for example, to limit the input string size and the opportunities for injecting code. Also, query the metadata to verify that specified columns exist in the input table.
Do not accept a query as input. Instead, accept only a table or view name as input, and check to be sure that the object exists. If you need to perform manipulation to prepare pivoting elements that have expressions, you can always create a view by using such manipulation and then provide the view as input.
Limit the aggregate function name size. For example, define the size as NVARCHAR(12), and check that the function appears in a list of your supported functions.
Use the QUOTENAME function to quote object and column names with square brackets. For example, QUOTENAME (N'col1') becomes [col1], while QUOTENAME (N'col1] DROP TABLE T1 --') becomes [col1]] DROP TABLE T1 --]. The closing square bracket, which is often used as an attempt to inject code, is doubled by the QUOTENAME function, neutralizing injection attempts.
Check the contents of the @cols variable, which contains the concatenated list of rotation elements. Remember that SQL injection can be achieved by injecting code into your data. Look for known strings used in SQL injection, such as --, sp_, xp_, DROP, and so on.
Create the procedure in a specific database and by using the usp_ prefix. To conform to Microsoft’s recommendations and to prevent use of the stored procedure globally in the instance, create sp_ pivot only in the database where you want to allow usage and by using the usp_ prefix instead of sp_.
To implement these security measures, run the code in Web Listing 1 (which you can download at InstantDoc ID 94268) to create the usp_pivot stored procedure in the Northwind database. The code in Listing 2 invokes the procedure to return the sum of freight for each employee and shipper; Table 1 shows the results.
As always, tightening security comes at the cost of decreased flexibility. And although these measures help you mitigate the risk of SQL injection attack, when you use dynamic SQL, it’s hard—if not impossible—to guarantee that you can block all SQL injection attempts. Use dynamic SQL with caution. Still, with these safety measures in place, usp_pivot provides a powerful tool for pivoting data and giving users information as they need to see it.
About the Author
You May Also Like