Mode Puzzle Solutions
Itzik provides solutions to the Mode puzzle.
October 8, 2007
On September 27, 2007 I provided a puzzle where you were asked to
calculate the Mode of the distribution (the value that occurs most frequently).
You can find the details of the puzzlehere.
I got solutions to the puzzle from Alejandro Mesa, Steve Kass,
Umachandar Jayachandran, Michael Mac, Dieter Noeth, paPai and rlahoty.
Here I’ll describe a sample of the solutions which I find most elegant.
Most of the solutions would benefit from an index on the columns
CustomerID, EmployeeID:
USE Northwind;CREATE INDEX idx_cid_eid ON dbo.Orders(CustomerID, EmployeeID);
Several people solved the two variations of the puzzle by using ranking
calculations and a table expression. This was also the approach I used in my
first attempt at solving the puzzle:
Solution based on ranking, Variation 1
WITH C AS( SELECT CustomerID, EmployeeID, COUNT(*) AS Cnt, RANK() OVER( PARTITION BY CustomerID ORDER BY COUNT(*) DESC) AS Rnk FROM dbo.Orders GROUP BY CustomerID, EmployeeID)SELECT CustomerID, EmployeeID, CntFROM CWHERE Rnk = 1;
The code groups the orders by customer and employee calculating the count
of orders for each group. The RANK function is used to rank the
employees based on their order count in descending order within the
customer (PARTITION BY CustomerID). Remember that the RANK
function will assign the same rank value to all employees with the same order
count, and the highest will be assigned with a rank of 1. The rest is just a
matter of defining a table expression based on the query that calculates the
ranking values and filtering only the rows with a rank value of 1.
Solution based on ranking, Variation 2
WITH C AS( SELECT CustomerID, EmployeeID, COUNT(*) AS Cnt, ROW_NUMBER() OVER( PARTITION BY CustomerID ORDER BY COUNT(*) DESC, EmployeeID DESC) AS RowNum FROM dbo.Orders GROUP BY CustomerID, EmployeeID)SELECT CustomerID, EmployeeID, CntFROM CWHERE RowNum = 1;
To solve Variation 2 of the puzzle you need to make a minor change to the
previous solution—use the ROW_NUMBER function instead of RANK
and add EmployeeID DESC to the ranking calculation’s ORDER BY
clause. In case of ties, the employee with the highest EmployeeID will be
chosen. BTW, it is not necessary to use ROW_NUMBER instead of
RANK since both are logically equivalent when the ORDER BY list is
unique. I just think that it is more natural to use ROW_NUMBER in case
I’m after only one row per customer.
Steve Kass also provided very elegant solutions that avoid the need to use a
table expression. He achieved this by using TOP (1) WITH TIES and
specifying the ranking calculation in the query’s ORDER BY clause. Here
are his solutions to the two variations of the puzzle:
Solution based on ranking and TOP 1 WITH TIES, Variation 1
SELECT TOP (1) WITH TIES CustomerID, EmployeeID, COUNT(*) AS CntFROM dbo.OrdersGROUP BY CustomerID, EmployeeIDORDER BY RANK() OVER( PARTITION BY CustomerID ORDER BY COUNT(EmployeeID) DESC);
Solution based on ranking and TOP 1 WITH TIES, Variation 2
SELECT TOP (1) WITH TIES CustomerID, EmployeeID, COUNT(*) AS CntFROM dbo.OrdersGROUP BY CustomerID, EmployeeIDORDER BY ROW_NUMBER() OVER( PARTITION BY CustomerID ORDER BY COUNT(EmployeeID) DESC, EmployeeID DESC);
Another approach that several people used in their solutions was to query
the Customers table and use the APPLY operator to apply a table
expression to each customer. The table expression uses a TOP (1) query to
pull the employee that occurs most frequently. Here are the solutions to the
two variations of the puzzle based on the APPLY operator:
Solution based on APPLY, Variation 1
SELECT C.CustomerID, D.EmployeeID, D.CntFROM dbo.Customers AS C CROSS APPLY (SELECT TOP (1) WITH TIES EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders AS O WHERE O.CustomerID = C.CustomerID GROUP BY EmployeeID ORDER BY Cnt DESC) AS D;
Solution based on APPLY, Variation 2
SELECT C.CustomerID, D.EmployeeID, D.CntFROM dbo.Customers AS C CROSS APPLY (SELECT TOP (1) EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders AS O WHERE O.CustomerID = C.CustomerID GROUP BY EmployeeID ORDER BY Cnt DESC, EmployeeID DESC) AS D;
As you can see, the first solution specifies only the order count in the
ORDER BY clause and uses the WITH TIES option in order to return
multiple employees in case of ties. The second solution specifies both the
order count and employee id in the ORDER BY clause and does not use
the WITH TIES option since only one row is required per customer.
The previous solutions are elegant and fairly simple in their logic. But if
performance is of the highest priority, there is a solution that is a bit more
complicated, but is faster than all others. This solution will only work for the
second variation of the puzzle where you are after only one employee per
customer.
The solution first groups the data by customer and employee and calculates
the order count for each such group. The solution then groups the result by
customer, concatenates the order count and the employee id (call it
CntEmpID), and calculates the MAX CntEmpID value. You can use
various techniques to concatenate the values. The technique I used is to
multiply the order count by 2147483648 (max supported four-byte integer
plus 1) and adding the employee id. Once the max value is calculated (call it
MX), you can extract the order count by dividing MX by 2147483648
using integer division, and the employee id with a modulo (MX %
2147483648). Here’s the complete solution:
Solution based on concatenation, Variation 2
SELECT CustomerID, MX % CAST(2147483648 AS BIGINT) AS EmployeeID, MX / CAST(2147483648 AS BIGINT) AS CntFROM (SELECT CustomerID, MAX(CntEmpID) AS MX FROM (SELECT CustomerID, CAST(2147483648 AS BIGINT) * COUNT(*) + EmployeeID AS CntEmpID FROM dbo.Orders GROUP BY CustomerID, EmployeeID) AS D1 GROUP BY CustomerID) AS D2;
Cheers,
--
BG
About the Author
You May Also Like