Data Warehousing: Degenerate Dimensions
Degenerate dimensions are control numbers that are stored in the fact table of the data warehouse. Control numbers enable you to see which items in a fact table originate from the same order or invoice. Although control numbers are mapped to the fact table like other attributes from the OLTP database, they aren't treated as keys.
May 27, 2008
A degenerate dimension is not one that lacks moral structure or integrity. Instead, a degenerate dimension is a dimension that doesn’t exist as a table but is represented in the data warehouse.
Data warehouse dimensional design requires you to include control documents such as invoices, orders, and warranties. Each of these control documents has a control number such as the invoice number, the order number, or the serial number of the item under warranty. Degenerate dimensions are simply control numbers that are stored in the fact table of a data warehouse. These control numbers look like keys, but they don’t act like keys; they have no associated dimension to join with. Control numbers provide a way to identify which line items in the fact table were generated as a part of the same order or invoice. Let’s take a look at how to map control numbers from the OLTP database to the fact table in the data warehouse and associate them with each line item.
How to Map Control Numbers to the Fact Table
Control numbers originate in the OLTP database. For instance, in an ordering system, the order data is stored in two OLTP tables, the Order Header table and the Order Detail table. These two tables have a one-to-many (1:M), parent-to-child relationship, as shown on the left side of Figure 1. For a detailed explanation of this two-level structure, see “Metamodel for Retail Sales,” May 2001. The Order Header (parent) table contains data such as the date and time of the order, who took the order, who placed the order, where the order was placed, and the control number. The Order Detail (child) table contains the line items, such as what was ordered, the number of each item ordered, and the associated costs. When you transfer these tables from the OLTP database to a data warehouse, these two tables become a single fact table that’s populated with keys and measures (i.e., attributes that you can add up), such as the number of items ordered and the associated costs (e.g., the cost of the items ordered, discounts, various taxes, shipping and handling by item), as shown on the right side of Figure 1.
In Figure 1, the dashed lines between the OLTP database and the data warehouse indicate how the attributes in the OLTP database are mapped to the fact table in the data warehouse. The two red mapping lines identify the control numbers from the Order Header and the Order Detail tables. Note that the control numbers are mapped to the fact table, but unlike the other attributes from the OLTP database, they’re not treated as keys once they’re in the data warehouse.
The granularity of the fact table should typically be at the line-item level; every record in the fact table will be a line item from the actual order, which is represented by a record in the OLTP Order Detail table. This level of granularity enables you to query the data warehouse in multiple ways. For example, you can query which product sold best on a Monday morning between the hours of 9 a.m. and 10 a.m. But do you really need control numbers, such as the OrderID or the OrderDetailID, to respond to such queries?
The short answer is “no,” but you do need these control numbers in the data warehouse. Without these control numbers, you wouldn’t be able to reassemble the order as it was originally placed, track the history of the order (e.g., when it was picked, packed, shipped, delivered, returned, refunded), or even do something as simple as calculate the average number of items per order. The control numbers are valuable, so you can’t throw them away.
Handling Control Numbers in the Data Warehouse
So how do you handle these control numbers in the data warehouse? As Figure 1 shows, strip the identifiers from the Order Header table and include them as keys in the Order fact table. These identifiers are OrderDate, which maps to TimeKEY in the fact table; StoreID, which maps to StoreKEY; and SalesPersonID, which maps to Employee-KEY. The customer identifiers, BillToCustomerID and ShipToCustomerID, both map to fact table keys (i.e., BillToCustomer-KEY and ShipTo-CustomerKEY) that are associated with the Customer dimension. Each of these keys is joined to an associated Customer dimension, which is shown in abbreviated form in Figure 1.
Next, you'll want to follow the same procedure for the Order Detail table. Then map ProductID to ProductKEY in the fact table, Quantity to Quantity, and Price to Price. What’s left over from the OLTP tables are the control numbers, OrderID and OrderDetailID.
Now place the control numbers in the fact table and associate them with every line item that’s a part of each order. I recommend positioning the control numbers immediately below the fact table key columns, and just above the typical numeric facts (e.g., Quantity), which are also called measures. Although the control numbers will look like foreign keys, they won’t have any corresponding dimensions to join with, thus the name degenerate.
If, by chance, there are other attributes in the OLTP Order tables that don’t break down logically into dimensions in the data warehouse, and these attributes are associated with a specific order or the details of an order, then create a real dimension that contains the order number, the order detail number, these attributes, and a surrogate dimension key. In that case, the dimension is no longer degenerate; it's just another set of foreign keys in the fact table.
Efficiently Store Control Numbers in the Data Warehouse
Control numbers serve a very important function in the OLTP databases in which they originate. When the OLTP database is imported into a data warehouse, the control numbers also have to be carried forward; otherwise, you risk losing meaningful information. The most efficient and effective way to handle these control numbers is to embed them in a fact table as a degenerate dimension.
About the Author
You May Also Like