More TOP Troubles: Using TOP with INSERT, UPDATE, and DELETE

Using the SQL Server 2005 TOP option with data-modification statements produces ambiguous results. Fortunately, you have some choices for circumventing these TOP limitations.

Itzik Ben-Gan

January 30, 2006

5 Min Read
ITPro Today logo in a gray background | ITPro Today

As you've no doubt surmised from my previous articles in this series ("TOP Troubles," January 2006, InstantDoc ID 48616 and "Making TOP Better," February 2006, InstantDoc ID 48937), Microsoft's implementation of the TOP option in SQL Server 2005 has some significant shortcomings that hinder its usefulness to DBAs. For example, when using TOP with an ORDER BY clause, ORDER BY serves two different sorting functions—determining the order of calculation and determining the order of records in the output cursor—but you can't separate the two functions. In "Making TOP Better," I describe alternatives to TOP that give you full control over the different sorting functions and even let you request the TOP n rows for each group. I'll conclude the series here by describing some additional problems that TOP poses when it's used with data-modification statements (i.e., INSERT, UPDATE, and DELETE) and alternative techniques that avoid these problems.

TOP and Data Modifications


In SQL Server 2005, Microsoft has added support for using the TOP option with data-modification statements. (For more information about using TOP with modification statements and the applications that might benefit from it, see "A More Versatile TOP," September 2005, InstantDoc ID 47690.) But using TOP with modification statements is even more ambiguous than using TOP with SELECT queries because with modifications, you can't specify an ORDER BY clause at all. Essentially you end up modifying the TOP (n) rows that SQL Server happens to access first.

To demonstrate using TOP with modifications, first run the following code, which copies the contents of the Orders table from the Northwind database to the MyOrders table:

SET NOCOUNT ON;USE Northwind;SELECT *INTO dbo.MyOrdersFROM dbo.OrdersORDER BY RAND(CHECKSUM(NEWID()));

Now, consider the following DELETE statement:

DELETE TOP (3) FROM dbo.MyOrders;

Which three rows will be deleted? You can’t specify an ORDER BY clause that would tell SQL Server, for example, to delete the three orders with the minimum order dates, using the order ID as a tiebreaker. This DELETE statement simply deletes the first three orders that SQL Server finds. By the way, if you’re curious about which rows were actually deleted, remember that you can now specify an OUTPUT clause in modification statements to request the affected rows as output, as this sample statement does:

DELETE TOP (3) FROM dbo.MyOrdersOUTPUT deleted.OrderID, deleted.OrderDate;

Similarly, you can’t dictate the order in which the following UPDATE statement will modify rows:

UPDATE TOP (3) dbo.MyOrders  SET Freight = 123.456;

Again, SQL Server will modify the first three rows that it finds first.

If you want to see which rows were modified, use the OUTPUT clause like this:

UPDATE TOP (3) dbo.MyOrders  SET Freight = 123.456OUTPUT inserted.*;

As I describe in “A More Versatile TOP,” certain applications can benefit from the ability to use the TOP option with modifications, when you don’t care about the order in which rows will be chosen—for example, deleting or updating large volumes of data in batches, to avoid log-explosion and lock-escalation problems. However, obviously the use of the term TOP in such cases is a misnomer.

The Ideal TOP with Modifications


Imagine how nice it would have been if TOP supported an OVER clause with modification statements, where you could specify the order in which rows were chosen. For example, you could delete the three orders with the minimum order dates, by using the order ID as the tiebreaker, as this example shows:

DELETE TOP (3) OVER(ORDER BY OrderDate, OrderID) FROM dbo.MyOrders;

Or, you could even use the PARTITION BY clause to choose rows per each group. For example, you could use a statement like this to delete the three oldest orders for each customer:

DELETE TOP (3) OVER(PARTITION BY CustomerID ORDER BY OrderDate, OrderID)FROM dbo.MyOrders;

Alas, T-SQL 2005 doesn't support this syntax. Fortunately, by using common table expressions (CTEs) and the ROW_NUMBER function, you can gain full control of the order in which SQL Server will choose rows for modification purposes and still be able to limit the number of rows. For example, the following code deletes the three orders with the minimum order dates, using the order ID as the tiebreaker:

WITH OrdersRN AS(  SELECT *, ROW_NUMBER() OVER(ORDER BY OrderDate, OrderID) AS RowNum  FROM dbo.MyOrders)DELETE FROM OrdersRNWHERE RowNum <= 3;

If you want to see the OrderID and OrderDate values of the deleted rows, specify OUTPUT deleted.OrderID, deleted.OrderDate right after OrdersRN in the outer DELETE statement (…DELETE FROM OrdersRN OUTPUT deleted.OrderID, deleted.OrderDate WHERE…). The OrdersRN CTE query expression generates row numbers based on OrderDate, OrderID order. The outer DELETE simply filters only those rows that have a row number that's less than or equal to 3.

Additionally, the ROW_NUMBER function supports a PARTITION BY clause, so that you can request to delete TOP (n) rows for each group. For example, the following code deletes the three oldest orders for each customer:

WITH OrdersRN AS(  SELECT *,    ROW_NUMBER() OVER(PARTITION BY CustomerID                      ORDER BY OrderDate, OrderID) AS RowNum  FROM dbo.MyOrders)DELETE FROM OrdersRNWHERE RowNum <= 3;

Similarly, you can use the CTE and ROW_NUMBER technique to achieve ordered updates. For example, the following code updates the Freight values of the three most recent orders:

WITH OrdersRN AS(  SELECT *,    ROW_NUMBER() OVER(ORDER BY OrderDate DESC, OrderID DESC) AS RowNum  FROM dbo.MyOrders)UPDATE OrdersRN  SET Freight = 123.456WHERE RowNum <= 3;

The following query updates the Freight values of the three most recent orders for each customer:

WITH OrdersRN AS(  SELECT *,    ROW_NUMBER() OVER(PARTITION BY CustomerID                      ORDER BY OrderDate DESC, OrderID DESC) AS RowNum  FROM dbo.MyOrders)UPDATE OrdersRN  SET Freight = 123.456WHERE RowNum <= 3;

In short, you have full control over the order in which rows are selected for modification.

TOP Without TOP


We've finally reached the end of our discussion about the SQL Server 2005 TOP option, its ambiguous nature, and alternatives for overcoming its limitations. You've seen how using CTEs and the ROW_NUMBER function can give you full control of the two different sorting functions that you might need with TOP queries as well as provide partitioning capabilities and even let you perform ordered data modifications. Ironically, you've gained all the missing functionality in TOP by not using TOP at all!

Learn more at "Jump Start: Using Order By."

 

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like