OLAP: Resistance Is Futile!
OLAP Services provides a firm foundation for building business intelligence solutions
March 31, 1999
SQL Server 7.0 OLAP Services offers powerful tools that help businesses use data to analyze business activity. Consisting of a middle-tier server called the SQL Server OLAP Server and a client-side caching and calculation engine called the Microsoft PivotTable Service, OLAP Services lets businesses quickly identify trends to support decisions that optimize business processes. For example, in about 20 minutes I used OLAP Services to create a data cube to analyze billable hours for the people I manage, by connecting to my company's time-and-billing system running on SQL Server 6.5. With little effort, I built a useful data mart that helps me do my job.
Business intelligence, executive information systems, and decision-support systems are approaches that use data to analyze business activity. For example, retail chains have huge amounts of raw transactional data. When aggregated properly at the right intervals, this transactional data can reveal customer buying habits and drive marketing decisions. By including OLAP Services in SQL Server 7.0, Microsoft gives unprecedented data analysis and reporting power to organizations that previously couldn't justify the cost of such a sophisticated business intelligence tool. This article examines the foundation of data warehousing and dimensional databases, and the capabilities Microsoft brings to SQL Server with OLAP Services.
Data Warehousing and Dimensional Data
Data warehousing is the technology that fuels a business intelligence engine. The data warehousing environment is complex because it can contain all of an organization's data and can use various methods to get data to executive-level business users for decision support. The biggest challenge is getting data from operational relational databases optimized for transaction processing to end users in a readable format with fast query capability. Extracting, cleansing, and transforming data on a scheduled basis to populate dimensional databases is a formidable task. One way to break up this task is to create data marts, which are subsets of a data warehouse. You can target a data mart for a specific business unit to satisfy narrowly defined requirements. This approach simplifies the design and implementation of a business intelligence solution. If managed carefully, data marts can serve as building blocks for the enterprise data warehouse.
You use dimensional databases to build data warehouses. Dimensional databases store facts about business activity, such as sales in dollars and units, in the context of dimensions such as time, geography, customer demographics, and products. By describing and storing the data in this manner, companies can execute high-performance ad hoc queries against the data. (Acceptable query performance in large data warehouses usually requires dimensional data modeling, a topic that's beyond the scope of this article. For more information on dimensional data modeling, study The Data Warehouse Lifecycle Toolkit by Ralph Kimball, John Wiley & Sons, 1998.)
A dimensional database stores relevant facts (for example, sales in dollars) in a fact table. The fact table is dependent on dimension tables, and the fact table's primary key consists of foreign keys from the dimension tables. Figure 1, page 26, shows a simple dimensional database (a star schema), in which the Sales table is the fact table, and the Time, Store, and Product tables are dimension tables. The dimension table data values, or members, are textual and serve as the row and column headers in reports of query results. These members let you filter, or slice, the data. For example, you can create views of dollar and unit amounts sorted by store, product, and time period. You can use the dimension members (for example, product category and month) to define the data slices to examine sales performance. You use the values in the dimension tables to browse the fact data. The fact table has an index on each column in its primary key to facilitate joins with the dimension tables and to increase query speed.
A star schema, such as the one in Figure 1, supports building three-dimensional data cubes, such as the one in Figure 2.The term star schema comes from the physical appearance of the dimensional model. The fact table is the center of the star and the dimensions are the points of the star. A snowflake schema is a model in which dimension tables are normalized and related dimension tables are attached to the points of the original star. The cube in Figure 2 represents a slice of the data from the dimensional database in Figure 1. This slice lets you look at total annual product sales in the clothing, toy, and housewares categories in Virginia, Maryland, and Washington, D.C., stores for 1993, 1994, and 1995. As the figure shows, you can zoom in on the part of the cube representing sales measures for toys in Virginia stores in 1994.
Each cube cell contains measures unless no values exist. If no stores were in Washington, D.C., in 1993, for example, the set of cells in the bottom-left corner of Figure 2 will be empty or contain zero values.
The Three OLAPs
The OLAP Server supports three types of dimensional data storage: Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP), and Hybrid OLAP (HOLAP). MOLAP uses compressed indexes and a query engine optimized for the relevant storage format. MOLAP stores aggregations to facilitate fast data access.
ROLAP stores aggregations in an operational relational database in summary tables for fast data access. This storage approach can become costly in terms of disk storage space and generally isn't as fast as MOLAP. However, ROLAP lets an organization leverage existing investments in database management hardware and software and offers better scalability than MOLAP.
HOLAP leaves the facts in the source database and stores aggregates in MOLAP fashion. Some people consider this solution to be the best-of-both-worlds approach. However, if all the necessary aggregations aren't stored in the data warehouse or data mart, queries back to the transaction-level detail in the source database will be relatively slow.
Microsoft OLAP Services
Microsoft brings to the OLAP arena tools to intelligently build, manage, and query data cubes and to implement data marts. SQL Server 7.0's OLAP Server supports MOLAP, ROLAP, and HOLAP; variable degrees of aggregation to optimize query performance against storage space; source data in dimensional or relational schemas; cube partitioning to allow queries against distributed, heterogeneous data sources; usage analysis that lets you focus on problem queries and rebuild aggregations to tune those queries; write-back for what-if analysis; incremental update capability; and an OLE DB interface extended for OLAP.
Client applications can access the OLAP Server features via the PivotTable Service (for details about the PivotTable Service, see r "Navigating the PivotTable Service"), with the OLE DB for OLAP specification and ActiveX Data Object Multi-Dimensional (ADO MD) object model. Another object model, Decision Support Objects (DSO), provides a programming interface to manage the OLAP Server that is similar to the Distributed Management Objects (DMO) that SQL Server provides. The OLAP Server contains databases, which in turn contain objects to support cube construction. Screen 1 shows the OLAP Manager, the built-in administrative user interface for OLAP Services.
An OLAP Services database consists of cubes, virtual cubes that are joins of cubes similar to a view in SQL Server, data sources, dimensions, and roles (which define privileges for users, based on NT security). After you identify data sources and define dimensions, you can build cubes. The OLAP Server includes the FoodMart sample database. Screen 2 shows the sales cube of the FoodMart database. It's a snowflake schema, which refers to the product_class table relationship with the product table.
After you create the cube and choose the storage technique, the OLAP Server designs the aggregations and processes the cube. If you choose the ROLAP storage technique, the OLAP Server will create the summary tables in the source database after the cube is processed. Otherwise, aggregations are stored in OLAP Server native format. You can choose the degree of aggregation by considering the level of query optimization you want versus the amount of disk space required. Screen 3 shows the Storage Design Wizard. In the FoodMart Sales cube example, I chose 80 percent performance, which produced 124 aggregations and required 22.5MB of storage space for MOLAP. The aggregations roll up, so if you choose low performance in favor of conserving disk space, the OLAP Server query engine will satisfy queries by summing existing aggregations. Cubes can be partitioned to access data from distributed, heterogeneous sources. This is a powerful feature for scalability because the OLAP Server can determine what data it needs from various sources to satisfy a query.
The FoodMart sample database is already in a dimensional schema, so the facts and dimensions easily map to the OLAP Server objects. You can build cubes directly against relational databases if the data is clean and consistent, which is usually a rare case. You can use the pubs sample database in SQL Server to build a sample cube. You can use the Sales table as the fact table and the Titles and Stores tables as dimensions. Then derive a time dimension from the ord_date column in the Sales table. The dimension wizard will ask whether the dimension is a standard or a time dimension. The pubs Sales table contains only the sales quantities. To determine the sales values, you can create a view in SQL Server that joins Sales and Titles, multiply sales.qty and titles.price, then use this view as the fact table. The fact table must also include ord_date, title_id, and stor_id.
You can use the OLAP Manager to browse the cube and dimension data. Screen 4 shows data from the Sales cube in the FoodMart database. This cube has 10 dimensions: Nine dimensions appear in the top pane, and the Time dimension is the column header in the grid display. You can rearrange the grid columns and rows by dragging the dimension title buttons to the axis you want them on. The replaced dimensions then appear in the top pane, and you can expand these dimensions for more detail. The six measures in the cube contain the business facts.
The OLAP Manager shows the cube data in a two-dimensional display. Third-party vendors offer powerful end-user analysis and presentation tools that work with the OLAP Server and PivotTable Service. These tools deliver sophisticated tabular and graphical presentation capabilities. For a list of vendors with front-end tools for the OLAP Server, visit Microsoft's Web site at http://www.microsoft.com/industry/bi/solutions/olap/olap.stm.
Screen 5 shows the Usage-Based Optimization Wizard, a feature you can use to adjust aggregations to meet specific needs. You can select queries based on a time frame, time to execute, user, or frequency. The OLAP Server will create additional aggregations based on the focus of the usage analysis to boost performance for critical or slow-running queries.
Write-back is an important capability for what-if analysis with cube data. By write-enabling a cube, you can look at the effect of changes to certain measures to forecast the results of such changes. Screen 6 shows the dialog box for write-enabling the Sales cube in the FoodMart database.
Rich Set of Tools
Microsoft has assembled a feature-rich set of decision-support tools in OLAP Services. OLAP is a compelling technology for organizations with complex data analysis and reporting requirements. SQL Server 7.0 OLAP Services and third-party tools let you easily build data mart solutions to get useful information to the business users who need it.
About the Author
You May Also Like