Using Joins to Modify Data
In the debut T-SQL Black Belt article, SQL Server MVP Itzik Ben-Gan demonstrates how you can use joins and subqueries to modify a table based on data in another table.
May 22, 2000
Editor's Note: This is the debut of T-SQL Black Belt, a series of articles that demonstrate practical, advanced tips for using T-SQL. Send your experts-only T-SQL tips to SQL Server MVP Itzik Ben-Gan at [email protected]. If we use your submission, you'll receive $100 and an exclusive T-SQL Black Belt shirt.
Sometimes you need to modify data, but the criteria that define which rows will be affected are based on data that doesn't exist in the modified table but in another table. You could use subqueries to solve the problem; or you could use a syntax that originated from Sybase and uses joins in the DELETE and UPDATE statements. This syntax isn't ANSI-compliant and might look strange at first glance. But if you're comfortable writing join statements, you'll find the syntax convenient, especially for inside triggers, in which you usually need to join the base table to the inserted or deleted tables.
These examples use Northwind sample database tables. Let's start with an abbreviated form of the DELETE statement syntax:
DELETE [FROM] [FROM JOIN ON ][WHERE ]
Suppose you want to delete from the Order Details table all rows for orders that the customer VINET places. The problem is that the Order Details table doesn't have information about the customer who made the order; this information is in the Orders table. The following DELETE statement will delete the appropriate rows:
DELETE FROM [Order Details]FROM [Order Details] AS OD JOIN Orders AS O ON OD.orderid = O.orderidWHERE customerid = 'VINET'
Note that this statement contains two FROM clauses, which might seem strange. The first FROM clause is optional, so you might prefer not to use it, but a DELETE query that doesn't perform a join is more readable with a FROM clause. Also, the Order Details table appears twice. The first occurrence (after the first FROM clause) specifies which table the statement modifies, and the second occurrence (after the second FROM clause) is used for the JOIN operation. This syntax doesn't let you specify more than one table after the first FROM clause. If it did, you wouldn't be able to determine which table the statement modified. Now, write the code inside a transaction, and use a ROLLBACK statement so the change won't commit in your database:
BEGIN TRAN-- check the rows you intend to deleteSELECT *FROM [Order Details] AS OD JOIN Orders AS O ON OD.orderid = O.orderidWHERE customerid = 'VINET'-- delete rowsDELETE FROM [Order Details]FROM [Order Details] AS OD JOIN Orders AS O ON OD.orderid = O.orderidWHERE customerid = 'VINET'-- check to see that all rows are deletedSELECT *FROM [Order Details] AS OD JOIN Orders AS O ON OD.orderid = O.orderidWHERE customerid = 'VINET'ROLLBACK TRAN
Even though this syntax isn't ANSI-compliant, it looks similar to ANSI SQL-92 joins. SQL Server also supports old-style syntax that looks similar to ANSI SQL-89 joins, in which you place the join condition in the WHERE clause:
BEGIN TRANSELECT...DELETE FROM [Order Details]FROM [Order Details] AS OD, Orders AS OWHERE OD.orderid = O.orderid AND customerid = 'VINET'SELECT...ROLLBACK TRAN
SQL Server also supports an abbreviated form of the old-style syntax, in which you don't need to specify the modified table a second time:
BEGIN TRANSELECT...DELETE FROM [Order Details]FROM OrdersWHERE [Order Details].orderid = Orders.orderid AND customerid = 'VINET'SELECT...ROLLBACK TRAN
Note that in this case, you can't provide a qualifier to the modified table. Following are a few alternatives that use subqueries:
BEGIN TRANSELECT...DELETE FROM [Order Details]WHERE orderid IN(SELECT orderid FROM Orders WHERE customerid = 'VINET')SELECT...ROLLBACK TRANBEGIN TRANSELECT...DELETE FROM [Order Details]WHERE EXISTS(SELECT customerid FROM Orders WHERE [order details].orderid = Orders.orderid AND customerid = 'VINET')SELECT...ROLLBACK TRAN
Note that in the first example, the subquery isn't related to the outer query. In other words, for each outer row, the subquery always returns the same results. The second example uses a correlated subquery, which means that the value the subquery returns depends on the outer query. You can't run a correlated subquery in isolation because the subquery refers to a table that the FROM clause doesn't reference; in this example, that table is Order Details.
Suppose that the Northwind database tracks orders by store, and the orderid column alone isn't unique. The primary key of the Orders table is expanded to orderid, storeid; and the primary key of the Order Details table is expanded to orderid, storeid, productid. Now, if you want to delete all rows from the Order Details table for orders that the customer VINET places, you need to perform a composite key join:
DELETE FROM [Order Details]FROM [Order Details] AS OD JOIN Orders AS O ON OD.orderid = O.orderid AND OD.storeid = O.storeidWHERE customerid = 'VINET'
You can't use the IN operator to implement this DELETE statement, but you can use the EXISTS operator to implement it:
DELETE FROM [Order Details]WHERE EXISTS(SELECT customerid FROM Orders WHERE [order details].orderid = Orders.orderid AND [order details].storeid = Orders.storeid AND customerid = 'VINET')
The UPDATE statement contains a similar syntax:
UPDATE SET col1 = [, col2 = ...][FROM JOIN ON ][WHERE ]
Suppose you want to add a 5 percent discount to order items in the Order Details table in which the supplier Exotic Liquids, whose supplierid is 1, supplies the parts. The problem is that the supplierid column appears in the Products table. Write this UPDATE statement:
BEGIN TRANSELECT...UPDATE ODSET discount = discount + 0.05FROM [Order Details] AS OD JOIN Products AS P ON OD.productid = P.productidWHERE supplierid = 1SELECT...ROLLBACK TRAN
Note that you can use either the table name or the table qualifier to specify the table you're modifying. These alternatives use subqueries:
BEGIN TRANSELECT...UPDATE [Order Details]SET discount = discount + 0.05WHERE productid IN (SELECT productid FROM Products WHERE supplierid = 1)SELECT...ROLLBACK TRANBEGIN TRANSELECT...UPDATE [Order Details]SET discount = discount + 0.05WHERE EXISTS(SELECT supplierid FROM Products WHERE [Order Details].productid = Products.productid AND supplierid = 1)SELECT...ROLLBACK TRAN
You can use joins and subqueries to modify a table based on data in another table. I demonstrated INNER joins, but you can also use OUTER joins. The optimizer tends to produce efficient plans for joins, but in many cases, the optimizer produces the same plan for the subquery equivalent. For example, when you use the EXISTS() function, the optimizer might come up with a better plan than the join query equivalent. To check the number of logical reads, SET STATISTICS IO ON, and to examine the execution plan, use SET SHOWPLAN_TEXT ON or the Graphical Execution Plan in the Query Analyzer. Try this with the first three variations of the DELETE statement that I've presented in this article, and you'll get the same plan for the JOIN query and the two subqueries. Whatever method you choose, you need to test, test, and test.
About the Author
You May Also Like