Single-Table Joins in Query Plans
Learn how the query optimizer uses a join for a single-table query with a WHERE clause
December 17, 2006
You might not realize that in a SQL Server 2005 query plan, the use of a nonclustered index on a table shows a join operation. Almost all the SQL Server Books Online (BOL) descriptions that discuss the JOIN keyword in a query refer to joining two tables. However, in the BOL sections on types of joins under “query tuning,” you’ll see references to the join operator in a query plan and a description that refers to two inputs, not two tables. For our purposes, we can consider an input to a join to be a set of rows; it could be an entire table or a set of rows after specified filters are applied. We’ll examine inputs to joins that are sets of index rows and learn about two types of query plans that indicate that SQL Server is performing a join, although only one table is in the query and no JOIN operator is specified in the T-SQL code.
Query Plan Using a Nonclustered Index
In the first type of query, the plan uses a nonclustered index to find the desired rows—that is, rows meeting a condition specified in the WHERE clause. You probably know that the index rows in the leaf level of a nonclustered index contain the index key columns and a bookmark, or pointer, to where the actual corresponding data row is located. As I discuss in “Inside Optimization,” October 2003, InstantDoc ID 39822, a bookmark can take one of two forms. If the underlying table has a clustered index, the bookmark is the clustered index key. If the table is a heap, the bookmark is a row ID, indicating the file number, page number, and slot number on the page where the row is located.
In SQL Server 2000, the query plan for a single-table query that uses a nonclustered index to find the data rows shows the bookmark lookup operation. The execution plan looks the same whether or not the table has a clustered index. To see the plan, first run the code in Listing 1 on a SQL Server 2000 instance to create a copy of the Orders table in the Northwind database, called Orders2.
The only difference between the two tables is that the original Orders table has a clustered index on OrderID and the new Orders2 table is a heap (i.e., a table with no clustered index). I created the two queries in Listing 2, then used SQL Server Management Studio (SSMS) to run these queries against a SQL Server2000 instance. The execution plans for the two queries look the same. The plan shows that SQL Server uses the non-clustered index to find the index rows meeting the desired condition (in this case, CustomerID = 'VINET'), then uses the bookmarks in the leaf-level index row to find the corresponding rows in the table.
In SQL Server 2005, the plan looks different. If you think about it, the bookmark lookup operation actually takes a set of rows from the index (all rows meeting the specified condition) and finds all rows in the underlying table that match those rows (i.e., that have rows with the same bookmark). These actions are the same as a join operation, so SQL Server 2005’s query plans show this operation as a join. SQL Server 2005 isn’t really doing anything differently; it’s just that the query plan describes SQL Server’s execution steps slightly more precisely.
To see a query plan containing an index seek of a nonclustered index in SQL Server 2005, use the Sales.Sales-OrderHeader table in the Adventure-Works database, run the query in Listing 3, and examine its query plan, which Figure 1, shows. The index seek at the top right is the seek from the index on the SalesPersonID column, which returns a set of index rows, each containing a clustered index key as a bookmark. The plan shows a nested-loop join that takes each bookmark (clustered index key) from the rowset provided from the index on SalesPersonID and finds the matching rows in the table, by seeking on the clustered index.
If Sales.SalesOrderHeader didn’t have a clustered index, the plan would look similar but not exactly the same. The plan still shows a nested-loop join operation, but instead of the seek in the clustered index, the plan shows a row ID (RID)lookup operation into the base table because the bookmark returned from the nonclustered index is the row ID.
Query Plan Using Index Intersection
The second type of query plan that can show a join operator even when the query contains one table occurs when the query optimizer decides to use two or more indexes on a table. (I describe this behavior— called index intersection—in “Inside Optimizer Enhancements,” November 2003, InstantDoc ID 39906.) If a query has multiple search conditions, and more than one has a potentially useful nonclustered index, the optimizer considers index intersection.
Consider the query in Listing 4. SalesPersonID and CustomerID each have a non-clustered index. If either search condition in the query were selective and returned only a few rows, the optimizer could simply decide to use one index. The optimizer would use a query plan like that in Figure 1 to find the rows in the underlying table based on one of the conditions. Then, once the optimizer had the rows from the table, it would filter them according to the second condition. However, if neither condition were selective, the optimizer might decide to use both indexes. The qualifying index rows from each index contain the nonclustered keys and bookmarks, and SQL Server could join the two rowsets together, matching index rows with the same bookmark.
In the query in Listing 4, neither condition is selective enough to use a nonclustered index. If you look at the plans for each query in Listing 5, you’ll see that each uses a clustered index scan, which is really the same as a table scan.
However, if you combine the two queries, you’ll get a plan like the one in Figure 2. SQL Server performs index seeks by using each nonclustered index, then sorts the rowset of index rows from the index on CustomerID by SalesOrderID, which is the clustered key. The index rows from the SalesPersonID index are already sorted by SalesOrderID, because the index’s leading column has only one value (278). After sorting the rows from the CustomerID index by SalesOrderID, SQL Server can perform a merge join. The result of the merge join is still just a set of index keys and bookmarks, so another join operation combines that rowset with the rows in the clustered index by using a nested-loop join. If the underlying table were a heap, the plan would look similar, but the base table would use a RID lookup instead of a clustered index seek.
Theoretically, the optimizer could choose to use index intersection on more than two nonclustered indexes on the same table. However, in addition to the cost of seeking through the index to find the relevant index rows, the optimizer must consider the added cost of managing the rowsets from each index and possibly performing sort operations and joining the rowsets together. Performing these actions is usually much more expensive than using a single index, joining to the base table to get the data rows, then filtering the rows to apply the additional search conditions.
An Obscure Join
Before SQL Server 7.0, SQL Server used at most one index per table in a query. Index intersection, introduced in SQL Server 7.0, lets SQL Server join sets of rows retrieved from multiple nonclustered indexes. SQL Server 2005’s query plans show us that using the index information from a nonclustered index to find rows in the underlying table is really a type of joining—between an index and a table. There are other ways of using multiple indexes on the same table in a single query, and I’ll explain these options in an upcoming column.
About the Author
You May Also Like