Getting to Know OLAP and MDX
Microsoft's new multidimensional database tools
Welcome to Mastering OLAP. This column will help you solve real OLAP problems using Microsoft's new Multidimensional Expression (MDX) language. OLAP enables users to work online interactively to analyze their company's data in multi-dimensional databases, and MDX is a query language for multidimensional databases. This column will be a hands-on tutorial. Each month we'll introduce an MDX tip and present a simple MDX puzzle to help hone your OLAP skills. This knowledge will be indispensable for database administrators' (DBAs') tasks, such as designing a data warehouse. In this first column, we define MDX and discuss basic OLAP concepts and terminology. In following columns, we'll explore the language in more detail. Let's start by discussing the question: What is OLAP?
Defining OLAP
Companies store volumes of valuable company information in OLTP systems and other corporate databases. Decision-makers use this valuable information to support business decisions. The difference between OLTP systems and OLAP is that OLTP systems help users capture the transaction information necessary to run their business operations, but OLAP systems analyze transaction information at an aggregate level to improve the decision-making process.
Traditional OLTP reporting systems produce reams of printed reports to aid in decision-making. These reports provide a lot of data (i.e., pages of numbers in grids), but they don't always convey much information. Traditional reports are usually static and can take days to generate. More important, traditional reports do not allow for what-if analysis. When you find an exception in the data, you usually need to run another report for more information. For example, suppose an OLTP system produces a report that shows sales and margins for all product lines in various European regions. You notice that product line B margins dropped in the past few months. You immediately want more detail about product line B: the cost of goods sold, average selling price, and discounts, for example. However, the OLTP system might take several days to produce a report that contains this information—and that report might lead to another question that requires another report. At this rate, you might need several weeks to find the source of the problem.
Don't fear, OLAP is here! OLAP systems let you view data in multiple ways. Thus, if you need answers sequentially, they are available in seconds rather than days. Also, OLAP systems are interactive, so users can ask follow-up questions and analyze their business data rapidly. In a 10-minute session with an OLAP query tool, a user might generate more than 30 queries to the underlying data source. OLAP systems greatly increase the quality of the information that you can use in decision-making. (For a quick tutorial about the vocabulary of OLAP, see the sidebar, "OLAP Terminology.")
Defining MDX
MDX adheres to Microsoft's new OLE DB MD data-access specification. OLE DB MD is the multidimensional database extension to Microsoft's new OLE DB data-access layer. Microsoft hopes that OLE DB MD and MDX become standards in the data warehousing market, the same way ODBC rules the OLTP and relational database market.
OLAP is the first step toward this goal. Currently, OLAP Services for SQL Server (formerly code-named Plato), Microsoft's new data warehousing extensions, is one of the few OLAP engines that natively speak MDX. In addition, vendors such as the SAS Institute and Applix have added MDX support to their products.
You can understand MDX by comparing it with Transact SQL (T-SQL), SQL Server's native SQL dialect (for more information about T-SQL, see Mike Reilly's "Writing Simple Select Statements."). Front-end application developers use a variety of object models such as ADO and APIs such as ODBC to communicate with back-end SQL Server systems. However, the client system translates the object model and API calls into T-SQL, which SQL Server understands. Similarly, OLAP application developers can use ADO MD, a new set of ADO extensions, and the client system translates the object model or API calls into MDX so that OLAP Services can process them.
MDX and OLAP
Do you need to learn MDX before you run queries against OLAP Services? Absolutely not. Most Microsoft OLAP query tools are visually oriented and generate MDX behind the scenes when you run queries. If you don't know T-SQL, can you use a smart front end, such as Microsoft Access, to program a traditional SQL Server application? Absolutely. Will you get better data in a more effective way if you have a good understanding of T-SQL? Absolutely. The same idea holds true in the OLAP world.
You can use MDX in two basic ways. First, you can specify an entire MDX query: Like an SQL query, an MDX query uses a FROM clause to specify a data source (i.e., a cube), a WHERE clause to filter the data, and a SELECT clause to project the resulting data into rows and columns.
Alternatively, you can use MDX as an expression language. Most OLAP query tools provide a method to create custom numeric expressions. For example, suppose you want to find the percent change between this year's sales data and last year's sales data. You can use an MDX expression to perform this calculation. When you use an MDX expression in this way, the expression is similar to a formula in an Excel spreadsheet. We'll explore MDX's role as an expression language in future columns when we discuss calculated members.
An MDX Query
Let's look at an MDX query. Microsoft designed the following query to run against the data in the FoodMart sample database, which installs by default when you install OLAP Services. You can use the MDX sample application included with SQL Server 7.0 OLAP Services to view and edit the sample MDX query. (We used the sample data and cubes in FoodMart to create a common ground for us and you to work from. If you're new to OLAP, you'll get more out of this column if you run the queries against the live FoodMart cubes and work through the results as we discuss them.) Also, you can experiment using other OLAP client applications such as Knosys' ProClarity (http://www.knosysinc.com), which includes an MDX editor for creating complete MDX statements. Check out http://www.microsoft.com/industry/ bi/solutions/olap/olap.stm for a full list of OLAP-compatible front ends.
The following query is the first sample MDX query included with OLAP Services:
SELECT {[Measures].[Unit Sales]}ON COLUMNS, ORDER( EXCEPT( [Promotion Media].[Media Type].members, { [PromotionMedia].[Media Type].[No Media]}), [Measures].[Unit Sales], DESC) ON ROWSFROM Sales
To uniquely identify a measure value in the cube, you must specify a member from every dimension in the cube. This combination of members, one from each dimension, is a tuple. This sample query references two of 13 dimensions in the Sales cube. This query references the [Unit Sales] member of the Measures dimension and also references the [Promotion Media] dimension.
If we reference only two dimensions, and we must specify a member from each dimension, how does OLAP Services fill the 11 missing members to complete the tuple? Each dimension has a default member. So if you specify {[Measures].[Unit Sales]} to be on a column and {[Promotion Media].[All Media].[Daily Paper]} to be on a row, these two members and 11 default members define the value of the cell in the cross section. Usually the default member is the All Member. The All Member is the single member at the root of the dimension hierarchy. The value of this member is the fully aggregated dimension. Some dimensions, such as the measures dimension, don't have an All Member, so the cube builder must specify the default member.
This query returns a two-dimensional result in rows and a column. It returns one column, Unit Sales, and several rows. These rows are all the members at the Media Type level in the Promotion Media dimension except No Media. The query returns these rows in descending order, sorted by Unit Sales. The Sales cube has a time dimension with no All Member. Two years, 1997 and 1998, are at the top of the hierarchy. The query uses 1997, because it is the default member of the time dimension.
In future columns, we'll discuss many OLAP and MDX functions. For now, we've given you background information about MDX and OLAP.
About the Authors
You May Also Like