n-Table Outer Joins
Strategies for conquering multiple-table joins
April 3, 2002
In "n-Table Joins," the third article in my series about joins (February 2002), I discussed writing an n-table inner join. No matter how many tables are involved in an inner join, intersections determine the join, I pointed out; the table order in the FROM clause is irrelevant. Conversely, an outer join always requires direction—LEFT or RIGHT. Yet, when I discussed outer joins between two tables in "The Outer (Join) Limits," November 2001, I noted that order can be irrelevant in two-table outer joins as well because LEFT and RIGHT directions are based only on the two tables that surround the JOIN keyword. (Remember that you always define joins two tables at a time.) Because you could reverse the order of the two tables—and switch the outer join type from LEFT to RIGHT or from RIGHT to LEFT—outer joins between two tables are seemingly straightforward. You'd expect that adding a couple of additional tables to an outer join would be simple. However, when you add a third, fourth, or nth table to an outer join without following certain rules, each additional table might interfere with the earlier outer joins by excluding the outer rows. Let's look at using Venn diagrams as a strategy for helping you write and read complex n-table outer joins with speed and accuracy.
First, a Little Housekeeping
All the examples in this T-SQL Tutor column use the TSQLTutorJoins sample database. If you've kept your TSQLTutorJoins database up-to-date through my three previous columns, you need to execute only the TSQLTutorJoinsUpdate.sql script for this month's additions. If your TSQLTutorJoins database is out of date or needs to be refreshed, run the complete sample database script called TSQLTutorJoinsPartIV.sql to create or recreate the database. By default and by design, the script will fail if the database already exists. To recreate the database, you have two options: drop the TSQLTutorJoins database manually or modify the script to include the drop. If you want to use the script to perform the drop, be sure to read the comments in the first IF section. After you modify the script, you can use osql.exe to execute this script. For information about why using osql.exe is important for scripts that create large databases, see "All About RAISERROR," December 2001.
A Recap of Venn Diagrams and Outer Joins
As you do in any query, you typically begin a JOIN query by asking a question about related tables. Because SQL Server performs joins two tables at a time, you can illustrate the data that the query returns from each part of the two-table join or the n-table join by using simple circles that overlap to show intersection—a Venn diagram. By using a Venn diagram, you can determine the data set that a particular type of join returns by looking at the data that exists within the various circles and at their intersection. For example, if you want to answer questions about the Customer and Order tables, you can illustrate those two tables as overlapping circles whose intersection represents an inner join. The complete circle on the side that you define in your query as outer represents an outer join.
As Figure 1 shows, the CustomerIDs on the Customer and Order tables have been positioned appropriately to show whether a customer has made a purchase. The placement of CustomerIDs 3, 4, and 7 in the outer portion of the Customer table shows that they're customers who haven't made a purchase. CustomerIDs 1, 2, 5, and 6 appear in the overlapping section, meaning that these customers have placed at least one order. In fact, CustomerIDs 2 and 6 have placed more than one order (i.e., the intersection includes a row for every match). Finally, the NULL values in the outer portion of the Order table represent over-the-counter orders, which don't require customer information. Using this diagram, you can anticipate which rows and the number of rows each type of query returns.
To answer the question, Which customers have placed an order? you'd use an inner join between the Customer and Order tables. The overlapping section in Figure 1 represents this query, which returns CustomerIDs 1, 2, 5, and 6. Listing 1 shows the INNER JOIN query that answers the question, Which customers have placed an order?
To answer the same question (Which customers have placed an order?) yet include all customers whether or not they've placed an order, you'd use an OUTER JOIN query in which Customer is the outer table. In the Venn diagram that Figure 2 shows, the circle on the left represents this query, which returns the same CustomerIDs (1, 2, 5, and 6) as the query in Figure 1 returned. But the query also returns CustomerIDs 3, 4, and 7, which appear in the outer portion of the left circle. Listing 2 shows the LEFT OUTER JOIN query that returns all customers—whether or not they've placed an order—as well as all customers who have placed an order. The left outer join returns nine rows—the six rows from the intersection as well as the three customers who've made no purchases—CustomerIDs 3, 4, and 7. (Note: If you've returned only eight rows, you're probably missing CustomerID 7, the new customer. See the housekeeping section near the beginning of this article to add this new customer or update your TSQLTutorJoins database.) As a result of this modification, the sample database has seven customers—four of whom have made purchases and three of whom have not.
The question, Which orders have valid customers placed? would yield the same data set as the answer to the question, Which customers have placed an order? Yet, if you want to see all orders whether or not a current customer placed them, you would need to make the outer table the Order table. As Figure 3 shows, the entire circle on the right (the Order table) represents this data set, which returns the same information for CustomerIDs 1, 2, 5, and 6 as in Figures 1 and 2 but also includes the NULL value rows in the outer portion of the right circle. Customers without a valid CustomerID placed order numbers 2, 8, and 9 over the counter. Listing 3 shows the RIGHT OUTER JOIN query that answers the question, Which orders have valid customers placed?
Finally, you can derive one last result set from this Venn diagram when you construct a query that returns all orders that valid customers have placed, displays all orders, and displays all customers. In this query, customer ID and customer order history are irrelevant. Figure 4, which represents the entire data set, returns the same information for CustomerIDs 1, 2, 5, and 6 as Figures 1 through 3, includes the NULL-value rows in the outer portion of the right circle, and includes CustomerIDs 3, 4, and 7 in the outer portion of the left circle. Listing 4 shows the FULL OUTER JOIN query that returns the complete data set.
Writing Complex n-Table Outer Joins
So far, all the joins and Venn diagrams I've illustrated have joined two tables. But what if you want more details (i.e., you want to join more than two tables—n tables)? For example, suppose you want to see in your Customer-Order query the product name from the Product table instead of just the ProductID stored in the Order table? To add the product-name column to the query, you need to join the query to the Product table. As a first attempt, you might add the Product table to the query, then add the JOIN condition between the Order table and the Product table. Listing 5 shows the syntax of this attempt to add the JOIN condition between the Order and Product tables.
Unfortunately, this attempt fails to deliver the desired data: The result set shows only six rows, whereas the LEFT OUTER JOIN query between Customer and Order returned nine rows. If you wanted the same result set from Listing 5's query as you obtained from Listing 2's query and you needed to add the product name instead of the ProductID, you'd expect the query to return all nine rows and keep the NULL value for customers who haven't purchased anything. However, when you added the Product table to the outer join, it was defined as an inner join. This inner join again lets the query return only the intersection—this time between the Product table and the result set that the LEFT OUTER JOIN between Customer and Order defined. You can think of the result set of the Customer-Order join as the left circle in the bottom half of Figure 5. You add this new circle to the Venn diagram, then join the Product table to it. Figure 5 shows the graphical representation of flow from the outer join to the inner join to Product. The intersection of this new circle and the Product table's circle define the final result set. In this intersection, you can see clearly why CustomerIDs 3, 4, and 7 are excluded from the output—they're outside the intersection.Those CustomerIDs are outside the intersection because the final join to the Product table is based on ProductID, and because these customers didn't make a purchase, their ProductID is NULL.
The solution to writing this query is to continue to honor the outer portion of the join in the final join—the section where the result set of the Customer-Order join is joined to the Product table. When you look at the Venn diagram that Figure 6 shows, you can see that you actually need a left outer join to join the Product table to the result set that the Customer-Order join returns. And when you look at the Venn diagram in Figure 5, you can easily see why only the six rows that contain a product are returned.
Let's look at two ways you can successfully join n tables while preserving the outer rows. To achieve the results that Figure 6 depicts and continue to show the outer-join results from the Customer table, you can make the join to the Product table an outer join as well. To determine whether to use LEFT or RIGHT, consider the result set from the Customer-Order join as the LEFT circle (because in the query, the Customer-Order join precedes the keyword JOIN where Product is added and is on the left side in the Venn diagram). And consider Product to be the RIGHT table. If you want to preserve all rows on the left (in the left result set as well as in the left circle in the diagram), this join will be a left outer join as well. Listing 6 shows the LEFT OUTER JOIN query that works.
At this point, you can continue to add tables (as long as you preserve the outer join by adding additional outer joins), or you can throw away the rows that the outer join returns by adding inner joins. Optionally, you can also rearrange the table order. Instead of showing Customer first, you can perform the Order-Product join first, then join the result set from the Order-Product join to the Customer table. Listing 7 shows the query that adds Product to the JOIN query by rearranging the table order.
A second way to preserve the rows that result from inner joins when the inner joins are combined with an outer join is to use parentheses to group the join combinations. The Order-Product join is really an inner join, whereas the join to Customer is the only outer join. Customer is the only table whose rows outside the intersection must be preserved. To successfully group the join between Order and Product, you need to treat the join as if it were a nested table joined to the Customer table. To create this nested table, you have to define the Order-Product join as an inner join, as the following code snippet shows:
dbo.[Order] AS O INNER JOIN dbo.Product AS P ON P.ProductID = O.ProductID
Then, you need to place this Order-Product inner join within parentheses and to the left or right of the join with the Customer table. Listing 8 shows how to write this JOIN query by using LEFT OUTER JOIN (where Customer is placed on the left of the Order-Product join), and Listing 9 shows how to write this JOIN query by using RIGHT OUTER JOIN (where Customer is placed on the right side of the Order-Product join).
You can write an n-table outer join in many different ways to join as many tables as necessary, but the following guidelines apply to all approaches. First, think in terms of two tables at a time. This guideline helps you concentrate on one subset of tables, simplifying the complexity of large joins. Second, use Venn diagrams to help you understand the flow of data from one join to the next. Venn diagrams are helpful because they show you the spots where you lose or gain records in the outer joins. When you're using Venn diagrams, simply draw empty circles to remind you what you're producing at each join—an intersection or a full circle—then use that result set as the left circle into the next join. You don't need to list the rows (i.e., the CustomerIDs in these examples), but determining where rows have been removed and which rows are going to be preserved is helpful. Finally, consider grouping to simplify complex joins by keeping the inner joins grouped, then adding them to the outer join as if they were just one table.
Writing complex n-table outer joins should never be a daunting task again. To prove it, try the homework assignment in the T-SQL Tutor Project sidebar.
About the Author
You May Also Like