The Data Warehouse Bus Architecture
A bus architecture is composed of a set of tightly integrated data marts that get their power from conformed dimensions and fact tables. A conformed dimension is defined and implemented one time, so that it means the same thing everywhere it's used. Fact tables that should be conformed include those that derive revenue, profit, standard prices, and standard costs.
October 24, 2007
If you’ve been reading my articles about data warehousing the past few months, you know something about facts and dimensions and how they form a dimensional model. But you’re probably wondering how you can create a data warehouse from facts and dimensions.
I believe it was Ralph Kimball (or someone in his organization) who coined the phrase bus architecture for the enterprise data warehouse. Microsoft uses the term Unified Dimensional Model (UDM) for a concept that’s similar to the bus architecture, but the UDM goes beyond the star schema.
A bus architecture uses top-down planning and a grid of business functions and dimensions to deliver a set of tightly integrated data marts. To get a better idea of how a bus architecture works, picture an electrical bus—a structure that everything else connects to and draws power from—such as a big electrical cable.
Data Warehouse 101
The data warehouse bus architecture is composed of a set of tightly integrated data marts that draw their “power” from a common set of conformed dimensions and facts. A dimension table is the “lookup” table of a dimensional model; it contains textual data that decodes an identifier in associated fact tables. A conformed dimension is defined and implemented one time and used throughout the multiple star schemas that make up the enterprise data mart. Dimensions define the who, what, where, when, why, and how of a situation, and are laid out for the benefit of business users. Figure 1, is an example of a portion of a bus architecture. It contains two fact tables (i.e., RETAIL SALES, SHIP FROM MANUFACTURING) that might be created from the manufacturing value chain, and the conformed dimensions that modify both fact tables.
To conform a dimension, every stakeholder must agree on a common definition for the dimension, so that the dimension means the same thing no matter where it’s used. For example, instead of having a time dimension whose granularity is expressed in weeks, quarters, and years for some dimensional models and in days, months, quarters, and years for other models, you would have one conformed dimension that’s used wherever a time dimension is required in the enterprise data warehouse. Keep in mind that the conformed dimension’s structure must reflect the finest grain that might be needed for any fact table.
So how do you associate a dimension to a fact table more than once, such as in Arrive Time and Depart Time in Figure 1? The answer is by using views. Make an Arrive Time view that contains as many or as few of the attributes in the TIME dimension as necessary to express arrival times for the SHIP FROM MANUFACTURING fact table. You can see that I’ve used views of dimensions (i.e., Depart Time, Arrive Time, To Location, From Location, Buyer, Seller) quite extensively in Figure 1. When a single dimension needs to appear several times in a fact table, we use data warehouse roles.
Once the dimensions have been conformed, you can conform the fact tables, which you’ll want to do immediately after conforming the dimensions while your understanding of the meaning and use of these fact tables is fresh. Conforming fact tables makes more sense if you picture a manufacturing value chain. For example, say you want to write a report that flows down from the manufacturing data mart to the retail point of sale (POS) data mart. A fact table stores measures; the best way to measure product at the manufacturing point is by cases shipped, and the best way to measure product at the POS is by units scanned (aka sold). How do you reconcile these two seemingly incompatible measures?
To do so, conform the fact tables by selecting and storing the finest grain of a measure. If the POS fact table measures product by individual unit, then the manufacturing fact table should also measure product by individual unit, as well as by cases shipped, which Figure 1 shows. It’s OK to store multiple measures of a product in the base-level fact table. Note that you can always preaggregate the fact tables by rolling the measures into sales by day, sales by week, and cases shipped per year per warehouse, which creates additional, summarized fact tables in the process. The granularity of the base-level fact tables should be consistent across the entire value chain, so that your report can flow freely and easily down the line, producing relevant, accurate results. You can select the granularity of the fact tables by matching the finest granularity of the associated dimensions that are part of the star schema. If the finest granularity of the Time dimension is an individual day, then the granularity of the associated base-level fact table(s) should also be the day.
Should every fact table be conformed? It depends. You might have a fact table that isn’t part of a value chain; in that case, there’s nothing to conform the fact table to except its associated dimensions. Typically, the types of fact tables that should be conformed are those that can be used to derive revenue, profit, standard prices, and standard costs.
When you’re conforming fact tables, a fact has to be the same everywhere it appears. If you have a fact called net quarterly revenue, the same equation and algorithm must be used to produce the numbered results everywhere it’s used. If you can’t conform net quarterly revenue across the board, then you need to name the variations accordingly, such as fiscal year net quarterly revenue, calendar year net quarterly revenue, and academic year net quarterly revenue.
Why Use a Bus Architecture?
There are many reasons to use a bus architecture. Here a just a few of those reasons.
Efficiency. A single copy of a dimension (e.g., time, geography) involves less maintenance—for example, you won’t have to synchronize changes to multiple time dimension tables.
Consistency. By definition, a conformed dimension means the same thing everywhere it’s used.
Ease of use. Once you get used to what the conformed dimension looks like, how it’s structured, what it represents, and its content, writing code against it is almost effortless.
Expandability. A conformed dimension decodes every fact table in the enterprise data warehouse that it’s associated with. When a new fact table is introduced to the data warehouse, linking the fact table to the existing conformed dimensions to form a new cube is straightforward and expands and extends the reporting capabilities of the enterprise data warehouse.
Standards enforcement. Once everyone has come to consensus on the architecture, meaning, and data content of a conformed dimension, the rest of the organization will have to use the design, and through frequent use these conformed dimensions will become the de facto standard.
Reporting enhancement. When you write reports against a data warehouse that’s built on a bus architecture, you can readily access manufacturing, inventory, distribution, and sales data and know that you’re getting the same information across the board.
Tightly Integrated Data Marts
Conformed dimensions and facts are the backbone architecture of the enterprise data warehouse. When you build an enterprise data warehouse of separately implemented but tightly integrated data marts, you can refactor each data mart as necessary. You can also add a new data mart to the “bus” when it’s needed, knowing that it will work with all the preexisting data marts.
About the Author
You May Also Like