Nondeterministic Row Numbers

Itzik explains how to calculate nondeterministic row numbers.

Itzik Ben-Gan

November 28, 2007

2 Min Read
ITPro Today logo in a gray background | ITPro Today

The row_number function is an extremely powerful tool that I use to simplify and optimize solutions to many problems.

Occasionally, I needed to calculate nondeterministic row numbers where order didn’t matter.

 

For example, suppose you have the following t1 table:

set nocount on;use tempdb;goif object_id('dbo.t1') is not null  drop table dbo.t1;gocreate table dbo.t1(col1 varchar(10) not null);insert into dbo.t1(col1) values('c');insert into dbo.t1(col1) values('a');insert into dbo.t1(col1) values('b');insert into dbo.t1(col1) values('a');insert into dbo.t1(col1) values('b');insert into dbo.t1(col1) values('b');insert into dbo.t1(col1) values('c');insert into dbo.t1(col1) values('a');insert into dbo.t1(col1) values('b');insert into dbo.t1(col1) values('a');go

You need to return all rows from t1 along with unique incrementing row numbers in no particular order. If order doesn’t matter, of course you can always specify an existing column from the table in the row_number function’s order by clause:

select col1, row_number() over(order by col1) as rownumfrom dbo.t1;

However, the execution plan for the query would involve sorting (or an index order scan if an index exists on the sort column). Here’s the plan you get for the above query:

 |--Sequence Project(DEFINE:([Expr1004]=row_number))       |--Compute Scalar(DEFINE:([Expr1006]=(1)))            |--Segment                 |--Sort(ORDER BY:([tempdb].[dbo].[t1].[col1] ASC))                      |--Table Scan(OBJECT:([tempdb].[dbo].[t1]))

If the order of the row numbers doesn’t matter to you, you’d probably rather not pay the sort penalty. The problem is that SQL Server doesn’t allow a constant in the row_number function’s order by clause. Try running the following query:

select col1, row_number() over(order by 0) as rownumfrom dbo.t1;


And you will get the following error:

Msg 5309, Level 16, State 1, Line 1Windowed functions do not support constants as ORDER BY clause expressions.

Until recently, in order to avoid sorting I defined a table expression (CTE or derived table) based on a query that returns the table rows along with a constant (call it const). In the outer query I invoked the row_number function with order by const:

with c as(  select col1, 0 as const from dbo.t1)select col1, row_number() over(order by const) as rownumfrom c;


Here, the optimizer is smart enough to realize that sorting is not required. Here’s the plan I got for this query (notice there’s no sort operation):

 |--Sequence Project(DEFINE:([Expr1005]=row_number))       |--Compute Scalar(DEFINE:([Expr1007]=(1)))            |--Segment                 |--Table Scan(OBJECT:([tempdb].[dbo].[t1]))

Even though this technique avoids sorting, it is a bit awkward. Recently I got a tip from a T. Wong how to achieve the same thing without the need for a table expression—simply specify order by (select 0)!

Here’s the solution query with the new technique:

select col1, row_number() over(order by (select 0)) as rownumfrom dbo.t1;


It is much more elegant than the previous technique, and also here the plan shows that the optimizer realized that sorting is not needed:

  |--Sequence Project(DEFINE:([Expr1006]=row_number))       |--Compute Scalar(DEFINE:([Expr1008]=(1)))            |--Segment                 |--Compute Scalar(DEFINE:([Expr1005]=(0)))                      |--Table Scan(OBJECT:([tempdb].[dbo].[t1]))


Cheers,

BG

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like