Virtual Auxiliary Table of Numbers
Itzik provides code to generate a virtual auxiliary table of numbers efficiently.
October 28, 2009
An auxiliary table of numbers is a helper table that contains a sequence of integers from 1 and on. It is a very handy helper table that I use to solve many different types of problems. In case you cannot, or do not want to generate a permanent table, you can produce a virtual one on the fly very efficiently. The trick is to use cross joins. You start with a virtual table with two rows:
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1)
Then, you perform a cross join between two instances of this table:
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)
Then, you perform a cross join between two instances of the last table:
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)
And after applying such cross joins five time, you have a table with 2^2^5 (4,294,967,296) rows:
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B)
To produce the actual sequence of numbers, you can use the ROW_NUMBER function with ORDER BY (SELECT NULL), letting the optimizer know that it doesn’t really need to sort the data.
You can create a table function based on this code, and request a certain number of numbers by passing an input parameter. The tricky part is to come up with a solution that stops processing the Cartesian products as soon as the requested number of numbers was produced, and not always try to generate all four billion of those. Until recently, I used the following solution:
IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
SELECT n FROM Nums WHERE n <= @n;
GO
To return 10 numbers you would simply pass the value 10 as input:
SELECT n FROM dbo.GetNums(10) AS Nums;
This generates the following output:
n
---
1
2
3
4
5
6
7
8
9
10
This code finishes fast and does stop processing after 10 rows are produced. If you look at the execution plan produced for this query, you will notice a Top operator that is in charge of stopping the processing at the right point. However, I noticed that in certain cases the optimizer removes the Top operator, and actually tries to generate all four billion rows, before the filtering part. Here’s one such example:
WITH Primes(p) AS
(
SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 5
UNION ALL SELECT 7
)
SELECT *
FROM Primes
CROSS APPLY dbo.GetNums(Primes.p) AS Nums;
Recently I revised the solution to use the TOP option instead of the WHERE filter like so:
IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
SELECT TOP (@n) n FROM Nums ORDER BY n;
GO
And for this solution the optimizer always seems to use the Top operator and stop processing in the right place. Try the following after recreating the function using the new version:
WITH Primes(p) AS
(
SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 5
UNION ALL SELECT 7
)
SELECT *
FROM Primes
CROSS APPLY dbo.GetNums(Primes.p) AS Nums;
And this time it should finish quickly.
Cheers,
BG
About the Author
You May Also Like