The Outer (Join) Limits
When you work with relational databases, you sometimes need to bring your tables together in a join. Explore the world of outer joins in the T-SQL classroom.
October 31, 2001
When working with relational databases, you sometimes need to bring your related tables together in a single resultset in an operation called a join. In "Joins 101," October 2001, I demonstrated how to create an inner join, which displays only the rows the two tables have in common (based on the JOIN condition). For example, an inner join between the Customer table and the Order table shows only the customers who've placed an order and only the orders placed by valid customers. Often, your interest goes beyond those customers who purchased something. For example, you might want to see all customers—with their orders, if applicable—regardless of whether they've purchased anything. For this, you need an outer join.
The Venn diagram in Figure 1 shows the relationship between customers and orders. The intersection (the inner join) returns only the rows where the circles overlap—in this case, CustomerIDs 1, 2, 5, and 6. Customers 3 and 4 have made no purchases. If you're looking for all customers, regardless of whether they've purchased anything, you need more than just the intersection. In effect, you're looking for the entire Customer circle, showing the orders for those who've placed them (the intersection) and a NULL value for those who haven't (the outer part of the Customer circle, which doesn't overlap).
To begin writing an outer join, I recommend that you write the FROM clause first and create a Venn diagram to illustrate what you're doing. The order in which you draw the circles in the Venn diagram will help you write your FROM clause. In Figure 1, the Customer table is on the left and the Order table on the right. Start writing your FROM clause by using an alias to list the two tables (see "Aliases in T-SQL" for information about how to create and use aliases) and the keyword OUTER for the join type:
FROM dbo.Customer AS C OUTER JOIN dbo.[Order] AS O
At this point, you haven't specified the outer table or the direction of the outer join. The direction defines which table all rows should be returned from—the outer table—regardless of whether a matching row exists in the joined table. In this case, you want all customers. The Customer table is on the left in the diagram and, more important, left of the JOIN keyword in the FROM clause; therefore, you're defining a left outer join. Add the join type LEFT just before the OUTER JOIN definition, and the FROM clause will be almost complete. You just need to add join criteria and your SELECT list. The complete syntax of this outer join is
SELECT C.FirstName AS CustomerFirstName, O.ProductIDFROM dbo.Customer AS C LEFT OUTER JOIN dbo.[Order] AS O ON C.CustomerID = O.CustomerID
This statement returns the results that Table 1 shows. The rows I've highlighted in bold are included only because this is an outer join. Rows not bolded are present in both an inner and an outer join.
Could you rewrite this query as a right outer join, and if so, would it be any different? You can indeed change this query to a right outer join by simply switching the order of the tables and changing the direction of the join to right. In a right outer join, the FROM clause would read as follows:
FROM dbo.[Order] AS O RIGHT OUTER JOIN dbo.Customer AS C ON C.CustomerID = O.CustomerID
You can use my TSQLTutorJoins sample database (see "Joins 101" Download the Code zip file) to test this join and verify that the results are the same regardless of their direction. Make sure you switch the table order when testing.
There is no difference in how these queries perform or gather data; the only difference is in the order in which the tables appear in the FROM clause. There is no difference in processing or performance between the left outer join and its equivalent right outer join. When you're working with two tables, the join type you choose is solely a matter of preference. I prefer to write two-table outer joins as left outer joins because I tend to draw Venn diagrams with the more interesting table (relative to what I'm looking for in my queries) on the left. However, when you perform outer joins on more than two tables, you need to combine left and right outer joins to more accurately answer questions about your data, rather than choose based on preference. (I'll talk more about that in a future column.)
You've listed all customers regardless of purchase, but what if you want to see all orders, regardless of whether a valid customer placed the order? The TSQLTutorJoins database supports over-the-counter sales by allowing NULL values for CustomerID in the Order table. Therefore, to see all orders regardless of whether they're associated with a valid CustomerID, you need an outer join, with Order as the outer table. In the SELECT list, let's ask for the product ID and customer's first name:
SELECT C.FirstName AS CustomerFirstName, O.ProductIDFROM dbo.[Order] AS O LEFT OUTER JOIN dbo.Customer AS C ON C.CustomerID = O.CustomerID
This code example returns the results that Table 2 shows.
Another Way
As I mentioned in "Joins 101," you often need to be able to recognize different syntax types and know the pitfalls of each. You can rewrite many joins another way—either with a subquery or possibly by using different syntax. In SQL Server, you can write outer joins in alternative syntax that's not always identical to the ANSI standard. A T-SQL syntax for outer joins existed even before ANSI had defined a standard specification for outer joins. I can't cover all the potential problems that this older syntax creates, but let's look at the syntax and a few shortcomings.
For a simple two-table query with no search conditions, the old T-SQL syntax compares well to the ANSI style. Using the old T-SQLstyle outer join to list all customers—with their orders, if applicable—regardless of whether they've purchased anything, you include only the tables and their aliases in the FROM clause:
FROM dbo.Customer AS C, dbo.[Order] AS O
and the join conditions in the WHERE clause:
WHERE C.CustomerID = O.CustomerID
However, this join isn't an outer join yet. In fact, it's an inner join. To make it an outer join, you add an asterisk to the join condition, before or after the equal sign. You put the asterisk on the side of the equal sign that the outer table is on. In this case, you want to list all customers, so Customer is the outer table. The asterisk goes on the left side of the equal sign because Customer is on the left as C.CustomerID. The WHERE clause now reads
WHERE C.CustomerID *= O.CustomerID
and the full syntax is
SELECT C.FirstName AS CustomerFirstName, O.ProductIDFROM dbo.Customer AS C, dbo.[Order] AS OWHERE C.CustomerID *= O.CustomerID
Although AS isn't required, this query becomes more difficult to read if you use the old-style syntax with AS removed, as the following example shows:
SELECT C.FirstName CustomerFirstName, O.ProductIDFROM dbo.Customer C, dbo.[Order] OWHERE C.CustomerID *= O.CustomerID
I want to stress that this old-style syntax isn't recommended and that not every join can be rewritten. For writing outer joins, I recommend that you always use the ANSI-92 standard. In fact, I recommend the ANSI-92 standard for all joins—inner and outer.
Full Outer Joins
The outer join between Customer and Order with Customer as the outer table returns all customers—even those who haven't made a purchase. The outer join between Customer and Order, where Order is the outer table, shows all orders—including the over-the-counter purchases, where Order.CustomerID is NULL. Now, what if you want to see both sets of data combined? For this situation, you need a full outer join.
Full outer joins are the ANSI-92 mechanism you use to show all the data in two related tables. For rows that meet the join conditions, you get the related data. For rows that don't meet the join conditions, you get a NULL value. To write a full outer join, you must use the ANSI-92 syntax; no comparable direct T-SQL alternative exists. Using ANSI-92 syntax, you can find all customers and orders by replacing the direction in the previous join query with the word FULL. For example, the following query
SELECT C.FirstName AS CustomerFirstName, O.ProductIDFROM dbo.[Order] AS O FULL OUTER JOIN dbo.Customer AS C ON C.CustomerID = O.CustomerID
returns the results that Table 3 shows. Note that the table displays 10 rows: the 6 rows that the two tables have in common, plus the 2 rows from each set that are common only to the outer joins.
Understanding Join Results
Now, let's review each type of join's resultset and look at how they differ. Understanding these differences will help you clearly see the results of a join, especially an outer join. For a join between the Customer table and the Order table, start by looking at the number of rows each table has. The Customer table has six rows; the Order table has eight rows. With an inner join, the maximum number of rows possible is always the number of rows in the larger table—in this case, eight. If every order were placed by a current customer or if the Order table's CustomerID column didn't allow NULL values, the inner join would return exactly eight rows. An inner join is simply the intersection between the two tables; direction is irrelevant.
The resultset for an outer join is determined first by the type of relationship that exists between the joined tables. For a one-to-one (1:1) relationship, the result of an outer join is easy to predict: The resultset always has the same number of rows as the outer table. If every row has a match defined by the join condition, you'll see the matching information exactly as you would for an inner join. For example, in the Pubs database, the publishers and pub_info tables have a 1:1 relationship. If the pub_info table contained information about every publisher, the outer join and inner join would produce the same resultset. However, the outer join (using publishers as the outer table) would still display any publisher that wasn't listed in the pub_info table. Remember that the point of the outer join is to include the rows for which no match exists. For the rows that don't have a match, you'll get a NULL value for the columns from the inner table. But no matter how you look at it, in a 1:1 relationship, no more than one row can match from each table, so the resultset is easy to predict and understand.
For an outer join based on a one-to-many (1:M) relationship, the resultset is slightly more challenging to predict, but it still follows a formula. In a 1:M relationship, look at which table is defined as the outer table and whether it's the primary key table or the foreign key table, based on the JOIN condition. The relationship between Customer and Order is a 1:M relationship in which Customer is the primary key table and Order uses a foreign key to reference it. Regardless of which table was the outer, the join returned eight rows. However, that's only a coincidence; the number of rows returned is always based on the join criteria and which table is the outer. When the outer table is the foreign key table, the formula is easy—the join always returns the number of rows in the foreign key table. In this case, the Order table has eight rows; therefore, an outer join with the foreign key table (Order) as the outer table returns eight rows.
When the outer table is the primary key table (usually the smaller of the two, but that's not a requirement), the formula changes. Imagine that one customer had placed all the orders. An outer join of Customer and Order would return first the intersection (all eight rows in this case), then all remaining customers. With one customer who placed all eight orders and five who placed none, the maximum number of rows returned in this case is 13. The formula to determine the maximum number of rows in an outer join when the outer table is the primary key table is the total number of rows in the two tables, minus 1.
Outer joins based on many-to-many (M:N) relationships can become quite complex; drawing Venn diagrams can help you write and understand them. Because M:N relationships are usually described by at least three tables, I'll leave those for my next column, which will cover complex joins between n tables.
About the Author
You May Also Like