Building a Cube from a Dimensional Database
Preaggregate data for fast query performance
September 21, 2000
Need a solution that lets you answer business questions flexibly and extensively? You can design, build, and populate a dimensional database, or data mart, from a relational database to support basic reporting requirements. In "Relational to Dimensional," June 2000, I created a dimensional database called Northwind_MD, using SQL Server's Northwind sample database as the source for operational data and a Data Transformation Services (DTS) package to populate the dimensional database. Although you can query the dimensional database directly, Northwind_MD stores facts at the transaction level, which means that at runtime, the query processor needs to aggregate transactional data—consuming resources and slowing response time. To improve the solution's performance and make the dimensional data available to your users for online analysis and reporting, you can use Microsoft OLAP Services and Microsoft Excel. OLAP Services uses OLAP Server on the server and PivotTable Services on the client to provide powerful cube design, processing, calculation, and caching capabilities that give users quick responses to their queries. And using Excel as an interface to the cube lets users easily work with the dimensional data.
New in SQL Server 2000
Before delving into how to extend the dimensional database solution with OLAP and a user-friendly interface, let's look at a new SQL Server 2000 Analysis Services feature that can improve the underlying dimensional database design. Figure 1 shows my original dimensional design.
The Employee dimension table in Figure 1 shows a straightforward hierarchy between manager and employee. But in the operational system, the employee table has a recursive relationship in which every row has a ReportsTo column that holds the employee ID of the employee's manager. The DTS package I used to populate the dimensional database captures the employee hierarchy as the query result in Figure 2 shows. Notice that Fuller is the top-level employee (employee ID 1) and has several people reporting directly to him, but Buchanan also has direct subordinates. In addition, both Fuller and Buchanan are responsible for their own sales figures.
My Employee dimension can break down this hierarchy only one way, as Figure 3 shows. The dimension shows Fuller and his subordinates, Buchanan and his subordinates, and Buchanan reporting to Fuller, but the hierarchy is more complicated than that. Buchanan is also responsible for all sales in his group, so Buchanan's group's sales (including his own sales) should roll up to him. And Buchanan's total sales should, in turn, roll up to Fuller.
To capture the full breakdown of the Employee hierarchy in your dimensional database, you need to pull apart the parent-child nesting relationship. Using SQL to break down a table with a recursive relationship is difficult. The pre-SQL Server 2000 techniques to handle multilevel hierarchies have two problems. First, they might require more information about the hierarchy than the source Northwind system has (see Joe Celko's SQL for Smarties: Advanced SQL Programming—Morgan Kaufmann—which discusses approaches for breaking down hierarchies in SQL). Second, they might require an involved set of T-SQL statements (see Kalen Delaney's Inside Microsoft SQL Server 7.0—Microsoft Press—for a T-SQL example that breaks down a hierarchy in a table with a recursive relationship). However, a new SQL Server 2000 Analysis Services feature breaks down hierarchies more completely, as Figure 4 shows.
I broke down the Northwind Traders Company employee hierarchy by using Analysis Server Dimension Designer's option to create a dimension from a single table with a parent-child relationship. Notice that Fuller is the only person at the top level, and Buchanan is subordinate. Employees who report directly to Fuller (including Buchanan) appear one level below Fuller. And Buchanan's subordinates and his own sales appear below him. This view represents the complete hierarchy of Northwind Traders Company employees, including how information rolls up the hierarchy and how a user can drill down to greater detail. (For more information about SQL Server 2000's Analysis Services, see Russ Whitney, Mastering OLAP, "Analysis Services 8.0," May 2000; "Analysis Services Drillthrough," June 2000; "Analysis Services Actions," July 2000; "Ragged Hierarchies," August 2000; and "Analysis Services Data Mining," September 2000.)
Building Cubes for Fast Response
SQL Server 2000's Analysis Services capability saves me from having to revise the DTS package to manage the employee hierarchy. I don't have to write additional code and use intermediate tables to fully expand the hierarchy; Analysis Services expands the hierarchy for me. Although Analysis Services could connect directly to the operational data to build dimensions and facts in a cube, I want to keep the base transaction data in my dimensional database in case I need to query the data directly. Thus, I redesigned the Employee dimension in the dimensional database to mirror the employee table's recursive relationship in the source database. Figure 5 shows the revised dimensional data model.
I made a minor change to the DTS package that I introduced in "Relational to Dimensional"; the change loads the revised Employee dimension table. (Web Listing 1 shows the new script for creating the dimensional database; click on the "Download the Code" icon at the top of the page. Also available online is the DTS package you use to populate the database; you can download this package by via the "Download the Code" icon at the top of the page.)
After you create the dimensional database, you can connect SQL Server 2000 Analysis Server to preaggregate data in cubes for fast query responses. First, you need to create a new database in Analysis Server by using Analysis Manager from Microsoft Management Console (MMC). You can then build a multidimensional cube. The following steps walk you through the process. (These steps are the same for SQL Server 2000 Analysis Server and SQL Server 7.0 OLAP Server, except that OLAP Server's Dimension Wizard doesn't offer the Parent-child hierarchy option; if you use OLAP Server, you'll have to rely on the less sophisticated hierarchy breakdown in the dimensional database's Employee dimension table.)
Step 1: Create an Analysis Server database. Analysis Manager lists all the Analysis Servers, including the local server. To create a new database on a server, right-click the server name to bring up a pop-up menu that lets you name a new database. I named the new dimensional database Northwind_Star. After you create a database, you can expand its view to see subordinate folders for Data Sources, Cubes, Shared Dimensions, Mining Models, and Database Roles. When creating a database and its subordinate components, Analysis Server creates meta data in an .mdb file that you can export to a SQL Server Repository. (The Repository is a meta data store for data warehouse applications that allows for more flexible, extensible, and sophisticated management of data warehouse components.)
Step 2: Add a data source. To create a data source that lets the Analysis Server access the Northwind_Star database, right-click the Data Sources folder that appears just below the new Northwind_Star database, and select New Datasource. In the Microsoft Data Link dialog box, select the OLE DB Provider for SQL Server and provide the server name, login information, and database name (Northwind_Star) for the Analysis Server connection.
Step 3: Create shared dimensions. Next, you create shared dimensions for use in a cube. Shared dimensions are available to any cube in the database. For example, to avoid creating duplicate dimensions and to provide common business metrics among cubes, a Time dimension is often shared. Alternatively, you could create private dimensions within and unique to a cube, but changes to private dimensions force you to rebuild the cube the dimensions belong to. To create a shared dimension for the employee data in the Northwind_Star database, select New Dimension and Wizard from the pop-up menu. (The wizard leads you through the same process whether you're using SQL Server 2000 or SQL Server 7.0; you'll just have fewer steps if you're using SQL Server 7.0.) Click Next on the first screen, and select Parent-Child: Two related columns in a single dimension table on the second screen. Click Next, then select the employee table on the resulting screen. Click Next, and select the columns that define the parent-child relationship, as Figure 6 shows. I used the EmpLastName Member name to let users navigate this hierarchy by employee last name.
On the next screen, select the Members with data check box (because members at all hierarchy levels have sales). On the following screen, select the Nonleaf members have associated data and Data members are visible check boxes. Sales reps and their bosses have their own sales, and the nonleaf option lets higher-level members have their own data. The second option displays the bosses' sales numbers and aggregates this data with their subordinates' sales. On the final screen, name the dimension Employee and click Finish. The Dimension Editor will open; close it, and you've created the dimension. To browse dimension data from Analysis Manager, simply right-click the dimension name.
You can follow the wizard to create another shared dimension that uses the Time table from the Northwind_Star database. Choose the Star Schema dimension type, then the time dimension option and the Date column. Select year, quarter, month, and day as parts of the time hierarchy. Accept the defaults for the other dialog boxes, then name the dimension Time. You can now create another shared dimension from the Northwind_Star Product table. Select the Star Schema dimension type and the Product Category and Product Name columns, and name the dimension Product.
Step 4: Build a cube. With your dimensions defined, you can use the dimensions and a fact table to create a cube. First, right-click the Cubes folder, and select New Cube. Select Wizard, and choose Order_Fact table as the fact table. On the next screen, select Quantity and Price columns as measures. Select all three dimensions in the next screen, and click Yes in the Fact Table Row Count message box. Name the cube Product Sales, and click Finish. The Cube Editor will open and show the dimension and fact table relationships.
Now, close the Cube Editor, and click Yes when the wizard asks whether you want to save the cube and design storage for it. Select MOLAP (Multidimensional OLAP) as the storage option to store the cube data in a proprietary compressed file format on Analysis Server. The next screen asks you to design aggregations; select Until I click Stop, and let the Aggregation Designer run. The Aggregation Designer should show approximately 17 aggregations under the graph in the right pane. On the final screen, select Process Now, and click Finish. A dialog box will open showing the SQL statements that Analysis Server runs, based on the aggregation design, to populate the cube. To view cube data, right-click the cube name and select Browse Data. The cube is now ready to use.
A Pretty Face
The next step in making your dimensional data accessible to users is to hook a user interface (UI) to the populated cube. PivotTable Services (Analysis Services' client component) provides connectivity, client caching, and calculation capabilities. Excel 2000 and third-party OLAP products use PivotTable Services, which installs with Microsoft Office 2000.
You can easily access this cube through Excel 2000, which is already on most users' desktops. To connect to Analysis Server from Excel, open a new Excel workbook and choose PivotTable and PivotChart Report from the Data menu. Select Get External Data from the PivotTable and PivotChart Wizard's first screen. From the second screen, click Get Data, then select the OLAP Cubes tab. To create a new data source, highlight New Data Source, then click OK. Name the data source, select the OLE DB Provider for OLAP Services (if you're using SQL Server 2000, select Microsoft OLE DB Provider for OLAP Services 8.0), then click Connect. The next dialog box asks you to enter your Analysis Server's name; you don't need to enter a user ID and password for this example. (Note that you can also connect here to a cube saved as a file.) Click Next, and select Northwind_Star Analysis Server database. Click OK through the following dialog boxes, and click Finish.
Figure 7 shows the resulting empty Excel worksheet. Notice the areas for inserting dimensions and measures for tabular display. The PivotTable dialog box contains the dimensions and measures available for viewing. You can drop dimensions and measures onto the worksheet. For example, you could place the Employee dimension on the row fields, the Product dimension on the column fields, the Time dimension on the page fields, and measures in the middle data items section.
Figure 8 shows a completed worksheet. Drop-down controls let you choose levels to include in the view. To change to a chart view of the data, click the Chart Wizard button (showing a multicolored bar graph) on the right side of the standard toolbar.
Stay Tuned for More
You could build this article's example cube directly against the relational Northwind database. But the primary advantage of building a dimensional database is that it gives you the ability to redirect the target of users' data analysis and reporting activities from the operational system to the data mart.
As you consider how to bring data quicky and easily to users, remember that SQL Server 2000 Analysis Services offers a wealth of functionality and can provide flexible, high-performance reporting and analysis for transaction-processing systems. PivotTable Services lets users access data easily and cost-effectively. And many third-party vendors offer full-featured products that have powerful graphics and reporting tools for online analysis. In addition, the Analysis Services Dimension Wizard's parent-child hierarchy option can help you create more sophisticated hierarchies.
We're not finished with the relational-to-dimensional example yet. In an upcoming article, I'll show you how to use DTS to incrementally update your dimensional database and cube.
About the Author
You May Also Like