Custom Aggregations: Specialized Solutions
Go for optimal performance
June 21, 2006
This article is the last in my four-part series about solutions to custom-aggregate requests. SQL Server's built-in aggregate functions are useful, but what if you need other types of aggregations that SQL Server doesn't provide? In the series' first three articles—"Custom Aggregations: The CursorTechnique"(http://www.sqlmag.com, March 2006, InstantDoc ID 49038),"Custom Aggregations: The Pivoting Technique" (May 2006, InstantDoc ID 49675), and "Custom Aggregations: User-Defined Aggregates" (June 2006, InstantDoc ID 49983), I discussed cursor-based, pivot-based, and UDA-based solutions, respectively. This month, I'll present a fourth solution: specialized solutions.
Specialized solutions are unique to specifically requested aggregate calculations.The goal is to achieve optimal performance for the given task by forsaking the generality of the solution. In other words, you obtain optimal performance by compromising other aspects of the solution. Specialized techniques wouldn't necessarily apply to other custom-aggregate calculations, so you won't be able to use the solution as a template for other calculations.You'll have to "invent" a different solution for each custom-aggregate calculation.
As with the rest of the columns in this series, I'm using string concatenation as my example of a customaggregation task. Here's a quick summary of the two sample concatenation tasks featured in the first three articles: The first task is to use the Northwind database to return each customer and, for each, a concatenated list of order IDs expressed as a comma-separated list of values and sorted by order ID, as Table 1 shows.The second task is to aggregate distinct, concatenated employee IDs for each customer, as Table 2 shows.
Remember that we're evaluating the usefulness of each solution based on several factors: Is the solution generic? Does the solution support an unlimited number of elements in a group, and does it have any special limitations? Can you apply the aggregation to the elements in a desired order? How well does it perform? Finally, is the solution set-based or is it iterative/procedural? Let's take a look at two specialized solutions: one based on assignment SELECT and another based on the FOR XML PATH option.
Custom Aggregates Based on Assignment SELECT
T-SQL supports a non-standard feature that lets you use a SELECT statement to query data and assign values from the source table to variables.I refer to this capability as an assignment SELECT. If the source table contains more than one row, the assignment will take place multiple times—once per row. You can achieve string concatenation by concatenating the current content of the variable with a character string stored in the current row:
SELECT @variable = @variable + FROM ;
You can rely on this capability to provide solutions for the string-concatenation requests I presented earlier in the article. I'll start with the request to return the concatenated order IDs for each customer. First, you create a function that accepts a customer ID as input and returns the concatenated order IDs for the given customer. Run the code in Listing 1 to create the fn_concatorders function, which accomplishes this task.
The function's code declares a variable called @ArrOrders and initializes it with an empty string. The code then invokes an assignment SELECT that—for each order belonging to the input customer (@cid)— concatenates the current content of @ArrOrders with a comma and the current OrderID. When the assignment SELECT finishes, @ArrOrders contains the concatenated order IDs for the input customer, with an extra comma at the beginning. The RETURN command returns the content of @ArrOrders after using the STUFF function to remove the first comma.To apply the function to each customer, simply query the Customers table and invoke the function in the SELECT list, providing it with the CustomerID column as input, as follows:
SELECT CustomerID, dbo.fn_concatorders(CustomerID) AS ArrOrders FROM dbo.Customers;
If you have an index on the Orders table's CustomerID and OrderID columns, this solution will run quickly. For each customer, only the rows belonging to the customer will be scanned at the leaf level of the index. This solution's drawback is that you can't guarantee the concatenation will be applied in any particular order.You can specify an ORDER BY clause in the query within the function, and you won't get a syntax error. However, SQL Server doesn't guarantee that the sorting will take place before the concatenation, so you simply can't rely on the ORDER BY clause. If you don't care about the concatenation order, this solution is valid and will give you the result quickly.
To perform distinct string concatenation, you'll need to make minor revisions to the function's logic. For example, suppose you want to return the concatenated employee IDs for each customer. Instead of querying the Orders table directly, you'd query a derived table that contains the distinct employee IDs for the given customer. Run the code in Listing 2 to create the fn_concatemps function,-which implements this logic. To get the desired result, query the Customers table and invoke the function in the SELECT list, providing it the EmployeeID column as input, as follows:
SELECT CustomerID, dbo.fn_concatemps(CustomerID) AS ArrEmps FROM dbo.Customers;
Custom Aggregates Based on FOR XML PATH
Another specialized solution uses the PATH mode of the FOR XML query option. SQL Server 2005 introduced the PATH mode, which provides an easier way to mix elements and attributes than SQL Server 2000's EXPLICIT directive provides. By using the PATH mode with an empty string as input (PATH('')) and proving the result column the alias [text()], instead of getting an XML value with tagged elements, you effectively get a single string that contains the concatenation of all input values. Michael Rys, a program manager with the Microsoft SQL Server development team in charge of SQL Server XML technologies, and Eugene Kogan, a technical lead on the Microsoft SQL Server Engine team, devised this technique.
Listing 3 shows the query that returns the concatenated order IDs for each customer. The query is invoked against the Customers table and applies the concatenation of order IDs for the outer customer in a subquery.The subquery filters the orders belonging to the outer customer and concatenates all order IDs based on OrderID ordering, with a comma in front of each. The STUFF function simply removes the superfluous first comma. With this technique, you have full control of the order of concatenation. I tested this technique against large tables, and it proved to be extremely fast: It concatenated a million values in a matter of a couple of seconds.
To apply this technique for distinct string concatenation, simply query a derived table with the distinct values instead of the base table directly. For example, to return the concatenated employee IDs for each customer, use the query that Listing 4 shows.
Evaluating Specialized Solutions
Starting with the advantages, the solution based on assignment SELECT is pretty fast, and the solution based on FOR XML PATH is extremely fast.You can fully control the order of concatenation with the FOR XML PATH solution. Neither solution is limited to a small number of elements; rather, they support the maximum string size (2GB) that SQL Server supports. The disadvantages of this solution are that you can't control the order of concatenation with the assignment SELECT solution, and neither solution is generic. Also, you can't treat them as templates and use similar code when you need to implement other custom aggregates.
Over the past several months, I've presented four types of solutions to custom aggregates. Each solution has its advantages and disadvantages. All solutions except the specialized ones are generic, whereas the specialized ones are specific to the given task. But the specialized solutions typically yield optimal performance.
About the Author
You May Also Like