GROUP BY ALL
Include groups that have no members in your query results
July 19, 2006
Many programmers continue to overlook helpful SQL Server features that have been available for years. Most of these overlooked features can simplify your queries, optimize their performance, and improve your productivity. One such feature is T-SQL's GROUP BY ALL option. Ask around, and I bet you'll find few T-SQL programmers who can explain what this option does.
By using the GROUP BY ALL option, you can create groups even if they're empty. In other words, your query returns groups of zero rows even when the WHERE clause filters out all the group's rows. Have I lost you yet? Never fear—read on, and you'll soon understand the differences between using the GROUP BY clause with and without the ALL option and learn some different ways to use GROUP BY ALL. And for more TSQL challenges, see the Logical Puzzle and the Web-exclusive sidebar "Catch That Bug."
GROUP BY and GROUP BY ALL
First, consider the following GROUP BY query, which doesn't use the ALL option:
SET NOCOUNT ON; USE Northwind;SELECT City, COUNT(*) AS NumEmpsFROM dbo.EmployeesGROUP BY City;
This query groups employees by city and returns the city and count of employees for each group, as Table 1 shows. Suppose you want to consider employees hired only in or after 1993. You just add a simple filter on the HireDate column, as the WHERE clause in the following query shows:
SELECT City, COUNT(*) AS NumEmpsFROM dbo.EmployeesWHERE HireDate >= '19930101'GROUP BY City;
This query filters out employees hired before 1993, groups the remaining employees by city, then returns the city and count of employees for each group, as Table 2 shows.
Notice that the query doesn't return Kirkland and Tacoma, because no employees from these cities were hired in or after 1993.Typically, the GROUP BY clause operates only on rows that remain after you apply the WHERE filter. However, by adding the ALL option to the GROUP BY clause, you can generate empty groups—or groups of zero rows—for cities that the WHERE clause filters out. Table 3 shows the results of adding the ALL option to the GROUP BY clause:
SELECT City, COUNT(*) AS NumEmps FROM dbo.EmployeesWHERE HireDate >= '19930101'GROUP BY ALL City;
Kirkland and Tacoma now appear in the query results, each with a count of 0, showing that no employees from these cities were hired in or after 1993. Including these cities in the result set indicates that the Employees table contains employees from these cities—just not employees hired in or after 1993.
The Best Option
You could get the same result without using the GROUP BY ALL option, but the alternatives are more complex. For example, you can use the UNION ALL set operation to return all cities, including those with zero employees that match the filter criteria, as the code in Listing 1 shows. In this example, two different queries return sets of data as input to the UNION ALL operation. The first query returns cities and counts of employees hired in or after 1993—without empty groups. The second query uses a NOT EXISTS predicate to return only cities that have no employees hired in or after 1993.
In SQL Server 2005, you can simplify this query by using the new EXCEPT set operation instead of the NOT EXISTS predicate, as the code in Listing 2 shows. You can also achieve the desired result by using a CASE expression that yields a 1 if the hire date is in or after 1993 or a NULL otherwise.You can then use the COUNT function on the CASE expression to consider employees hired only in or after 1993:
SELECT City, COUNT(CASE WHEN HireDate >= '19930101' THEN 1 END) AS NumEmpsFROM dbo.EmployeesGROUP BY City;
However, all these alternatives are more complex than using GROUP BY ALL.
Join In
The GROUP BY ALL option also has interesting applications in join queries. To demonstrate these uses, first run the code that Listing 3 shows to add two employees to the Northwind database's Employees table—Itzik Ben-Gan and Lilach Ben-Gan from Middle Earth, Hobitton—and to add an order for Lilach in the Orders table. (Some code in this article wraps to several lines because of space constraints.)
Now, examine the query in Listing 4, which returns US employees and the count of orders for each of those employees, as Table 4 shows. Because the query applies an inner join, the ON filter eliminates employees who have no orders. And the WHERE clause filters out non-US employees. So the result set doesn't include employees who made no orders (Itzik) or employees from outside the United States (the United Kingdom, Middle Earth).
Suppose you want to see all employees who placed orders, but you want the count of orders only for US employees. To add non-US employees to the result, you simply add the ALL option to the GROUP BY clause that Listing 5 shows. Table 5 shows this query's result set. The WHERE clause filters out non-US employees, but the ALL option in the GROUP BY clause adds those groups of employees back to the result. So Steven, Michael, Robert, Anne, and Lilach (who aren't in Table 4) now appear in Table 5, with order counts of 0.These employees handled orders, but they aren't located in the United States. Notice, however, that Itzik doesn't appear in the result because he didn't handle any orders.
Remember that GROUP BY ALL adds empty groups for rows eliminated in the WHERE clause—it doesn't add empty groups for rows eliminated in the ON clause. If you put the country filter in the ON clause and don't use a WHERE clause, the ALL option becomes meaningless. For example, the following query generates the same result as Table 4 shows, whether you use the ALL option or not:
SELECT E.EmployeeID, E.FirstName, E.LastName, E.Country, COUNT(*) AS NumOrdersFROM dbo.Employees AS E JOIN dbo.Orders AS O ON O.EmployeeID = E.EmployeeID AND E.Country = 'USA' GROUP BY ALL E.EmployeeID, E.FirstName, E.LastName, E.Country;
You might think you can produce the desired result by using an outer join without using the GROUP BY ALL option at all, as Listing 6 shows. Remember that you want to see all employees who made orders, but you want the count of orders only for US employees. However, as you can see in the result that Table 6 shows, this query includes Itzik even though he handled no orders. So this query returns all employees, regardless of whether they made orders, with the count of orders only for US employees.
An outer join adds an outer row for each row from the preserved table (in our case, the Employees table) that the ON filter eliminates. So the ON filter now contains both the correlation between the Employees and Orders tables and the country filter, and you end up getting back all employees— regardless of whether they handled orders or are from the US. Also note that with an outer join, you must provide an input from the nonpreserved table (Orders) to the COUNT() aggregate function so that you don't consider the outer rows. GROUP BY ALL generates empty groups, so COUNT() produces a 0 for those groups and you don't need special treatment for them.
As another example of how you can use GROUP BY ALL in a join query, suppose you want only employees from Middle Earth and their order counts, including employees who didn't handle any orders. Because you want to keep only employees from Middle Earth, you put the country filter in the ON clause. And because you want to also create groups for Middle Earth employees who handled no orders, you need the correlation between Employees and Orders in the WHERE clause. The query in Listing 7 generates the desired result, as Table 7 shows.
Of course, you can achieve the same result in other ways, including using an outer join. But the GROUP BY ALL option gives you a deeper level of control and, in some cases, lets you simplify your join solutions. When you're done experimenting with these queries, you can run the following cleanup code to delete the employees and the order you added to the Northwind database:
DELETE FROM dbo.Orders WHERE OrderID = 12345;DELETE FROM dbo.Employees WHERE EmployeeID IN(10, 11);
GROUP BY ALL's Future
GROUP BY ALL lets me develop simpler solutions than its alternatives. And although this option isn't part of the ANSI SQL standard, it fits well in the phases of logical query processing and meets the "spirit" of the standard. However, GROUP BY ALL apparently complicates matters for Microsoft's SQL Server development team as it adds new language elements to T-SQL. As a result, Microsoft might not support the option in the future—something to consider before deciding to use the feature in production code. The good news is that Microsoft hasn't formally marked GROUP BY ALL for deprecation yet. Typically, Microsoft removes a feature from SQL Server two versions after the version in which the feature formally enters a deprecation process. So, GROUP BY ALL should be around quite a while longer, giving you time to tap into its benefits.
About the Author
You May Also Like