Change in Behavior of RAND and NEWID in SQL Server 2005

Itzik talks about a change in behavior of RAND and NEWID in SQL Server 2005 that you may have overlooked.

Itzik Ben-Gan

September 9, 2007

17 Min Read
ITPro Today logo in a gray background | ITPro Today

I’m writing this blog entry as a result of a customer query regarding a change in behavior related to invocations of the RAND and NEWID functions in SQL Server 2005.

Suppose you need to write an expression in a query invoking the RAND or NEWID function (say for randomization purposes) and you need the function to be invoked only once. For example, suppose you need to make a random choice out of three options (call them ‘option one’, ‘option two’ and ‘option three’), and you write the following code:

select  case rnd    when 1 then 'option one'    when 2 then 'option two'    when 3 then 'option three'    else 'oops'  endfrom (select cast(rand()*3 as int)%3 + 1 as rnd) as d;

Remember that the RAND function returns a float value in the range 0 through 1, inclusive (0 and 1 are possible result values). Casting the expression rand()*3 as an integer will truncate the fraction part of the value.

Even though the probability to get exactly 3 back after casting is very low, it is still a possibility. This is why I used %3 (modulo 3)—to ensure the expression will return an integer in the range 0 through 2. By adding 1, the expression is guaranteed to return an integer in the range 1 through 3.

Internally, SQL Server rearranges a simple form of a CASE expression such as the above to the searched form, namely, it expands the WHEN clauses to incorporate the full predicates. As an example, the above CASE expression is internally evaluated as follows:

select  case     when rnd = 1 then 'option one'    when rnd = 2 then 'option two'    when rnd = 3 then 'option three'    else 'oops'  endfrom (select cast(rand()*3 as int)%3 + 1 as rnd) as d;

SQL Server 2000 evaluates each reference to the alias rnd assigned in the table expression d separately; therefore, it actually invokes the RAND function three times. You realize that this means that in SQL Server 2000 it is possible that none of the WHEN clauses of the CASE expression will evaluate to TRUE, and you might end up getting ‘oops’ back. Try running this code several times in SQL Server 2000 and you will be able to verify this.

SQL Server 2005 changes the behavior of outer references to aliases assigned in table expressions, where the aliased expression invokes the RAND or NEWID function. SQL Server 2005 will invoke the function only once, therefore it is guaranteed that one of the WHEN clauses in the above query will evaluate to TRUE, and you will never get ‘oops’ back.

You can test this with a similar example that invokes the NEWID function. To return a random value in the range 1 through n, instead of using the expression:

cast(rand()*n as int)%n + 1 

You can use the expression:

abs(checksum(newid()))%n + 1

CHECKSUM(NEWID()) returns a random integer. Applying ABS on top ensures you get a nonnegative integer. Applying %n (modulo n) ensures that the value is >= 0 and < n. By adding 1 you ensure that the value is >= 1 and <= n. In short, this is just another way to get a random integer value in the range 1 through n. So the above query can be rewritten as follows:

select   case rnd    when 1 then 'option one'    when 2 then 'option two'    when 3 then 'option three'    else 'oops'  endfrom (select abs(checksum(newid()))%3 + 1 as rnd) as d;

Try running this query several times in both SQL Server 2000 and in SQL Server 2005. In SQL Server 2000 you will occasionally get ‘oops’ back, while in SQL Server 2005 you will never get ‘oops’ back. This is due to the same change in behavior I described earlier.

A simple way to test the difference in behavior between the versions is by running the following code:

select rnd, rndfrom (select rand() as rnd) as d;

Run it in SQL Server 2000 and you will get two different invocations of RAND, hence most probably two different values back. Run it in SQL Server 2005, and you’re guaranteed to get the same value back twice since RAND will be invoked only once.

This change is described in SQL Server 2005’s Books Online under the section “Behavior Changes to Database Engine Features in SQL Server 2005,” but it’s very easy to overlook it.

Note that the change in behavior has nothing to do with multiple invocations of RAND or NEWID in the same query, as opposed to being invoked once in a table expression and then referenced multiple times in the outer query. For example, the following code can return ‘oops’ in both SQL Server 2000 and in SQL Server 2005:

select   case abs(checksum(newid()))%3 + 1    when 1 then 'option one'    when 2 then 'option two'    when 3 then 'option three'    else 'oops'  end;

In SQL Server 2005 you can now use a table expression as demonstrated earlier as a workaround. In SQL Server 2000 (and also in 2005), you can use a variable as a workaround:

declare @rnd as int;set @rnd = abs(checksum(newid()))%3 + 1;select   case @rnd    when 1 then 'option one'    when 2 then 'option two'    when 3 then 'option three'    else 'oops'  end;

You can experience similar problems in less obvious scenarios; for example, consider the predicate:

where abs(checksum(newid()))%3 + 1 between col1 and col2

Also here, the predicate is expanded internally and NEWID is invoked multiple times:

where col1 >= abs(checksum(newid()))%3 + 1   and col2 

So you end up getting two independent invocations of the NEWID function. If you want to rely on a single invocation, as demonstrated earlier in SQL Server 2005 you can use a table expression, and in both versions you can store the result of the expression in a variable and then refer to the variable.

As the last example of problematic use of RAND and NEWID, I recently saw code written by a programmer that was supposed to populate a temporary table with a set of unique random integers (say, 50 random integers in the range 1 through 100). The code looked similar to the following:


set nocount on;create table #random_values(rnd int not null);create index idx1 on #random_values(rnd); declare @i as int;set @i = 1;while @i <= 50begininsert into #random_valuesselect abs(checksum(newid()))%100 + 1where abs(checksum(newid()))%100 + 1not in (select rnd from #random_values);if @@rowcount = 1 set @i = @i + 1;end select rnd from #random_values order by rnd; drop table #random_values; 

You realize that the two invocations of the NEWID function (in the SELECT and WHERE clauses) are independent of each other. Therefore, this code may very well populate the temporary table with duplicate values in all versions of SQL Server. For example, here’s a subset of the output I got after running this code:

rnd-----------2344...

I’m not saying that this particular solution is the optimal way to get a set of unique random values rather just explaining the logical problems with this solution and the workarounds.

In SQL Server 2005 you can now use a table expression as a workaround, revising the SELECT query to the following:

...  insert into #random_values    select rnd    from (select abs(checksum(newid()))%100 + 1 as rnd) as d    where rnd not in (select rnd from #random_values);...

In both SQL Server 2000 and 2005 you can use a variable as a workaround:

...  set @rnd = abs(checksum(newid()))%100 + 1;  insert into #random_values    select @rnd    where @rnd not in (select rnd from #random_values);...

To summarize, care should be taken when using functions such as RAND and NEWID in queries. If you need to rely on a single invocation of the function, SQL Server 2005 will give you the desired behavior as long as you

encapsulate the invocation of the function in a table expression and then in the outer query refer to the alias of the expression as many times as you like.

In SQL Server 2000 (or any other version), you can use a variable as a workaround.

Learn more from my article "Exploring Changes Made to SQL Server 2005's UDF."

Cheers

--

BG

 

set nocount on;

create table #random_values(rnd int not null);

create index idx1 on #random_values(rnd);

 

declare @i as int;

set @i = 1;

while @i <= 50

begin

  insert into #random_values

    select abs(checksum(newid()))%100 + 1

    where abs(checksum(newid()))%100 + 1

      not in (select rnd from #random_values);

  if @@rowcount = 1 set @i = @i + 1;

end

 

select rnd from #random_values order by rnd;

 

drop table #random_values;

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like