Rock-Solid MDX
A powerful 2-step approach to developing and testing complex MDX queries
March 16, 2003
The MDX language is powerful but not easy to use. On the surface, MDX looks like SQL, but it can quickly become more complex because of the multidimensional nature of the underlying cube data. After more than 3 years of using MDX, I've found I'm more productive when I apply design and debugging techniques that help me better understand MDX and create more accurate MDX statements. The techniques I use for developing MDX are similar to those I use for developing software in other languages: for complex problems, I use pseudo coding and debug the code by displaying intermediate results. Let's look at an example of how I use these techniques, and along the way, I'll show you how to use a .NET language to develop MDX user-defined functions (UDFs).
If you have any formal software-development education, you know that to solve a complex problem, you first break the problem into parts and solve each part independently. Then, it's always a good idea to step through each line of your code, using a debugger to verify that the code works as intended. Most software developers know that these practices are good habits, but not enough programmers apply them. These good programming habits can help you effectively deal with MDX's complexity.
For example, say you need to answer a typical business question such as, "Based on unit sales, what are the top three brand names for each product department?" The MDX query that Listing 1 shows answers the question; Figure 1 shows the results. I used the familiar FoodMart 2000 Sales cube that comes with Analysis Services as the basis for my example. I have enough experience with MDX that when I wrote this query, it ran the first time (thus I skipped the good habit of breaking the code into parts). But the query is complicated because it performs ranking (TOPCOUNT) inside an iterative loop (GENERATE), and I wasn't sure I was getting the answer I really wanted. Let's see how I work through the problem in a way that emphasizes modularity (i.e., addressing each part of the problem separately) and accuracy. First, I use a design methodology called pseudo coding. Pseudo coding is a process of writing in plain language the steps for how you plan to implement your solution. For this problem, I want my code to follow the process that the pseudo code below describes.
For each product department,
find the set of all brand names for this product department
return the product department name
return the three brand names that have the most unit sales
When I start to translate this pseudo code into MDX, I get the following:
<> = GENERATE( [ProductDepartment].MEMBERS, <> )
Here, the GENERATE() function steps through a set of items and evaluates an MDX expression for each item in the set. This statement shows that to get the answer, I need to determine the product department name and the top brand names within it for each product department. Next, I expand the <> item in the previous statement to call out the current product department. The following expression shows that I need another expression to determine the top brands within this department:
<> = { [Product].CURRENTMEMBER, <> }
To determine the top brands within the product department, I use the TOPCOUNT() function and specify that I want the top three brands based on unit sales:
<> = TOPCOUNT( <>, 3, [Unit Sales] )
Finally, I determine the brands within the product department by using the DESCENDANTS() function with the selected product department:
<> = DESCENDANTS( [Product] .CURRENTMEMBER, [Brand Name] )
Remember, the GENERATE() function steps through the product departments and sets the product dimension's CURRENTMEMBER to the name of the current product department while evaluating the inner MDX expression.
If I take the MDX code fragments I created above and use the WITH statement to turn the code into a modular MDX statement, I get the MDX statement that Listing 2 shows. In Listing 2, I've used WITH statements to separate two of the three pseudo code steps from the main body of the query (SELECT ...FROM) to improve readability and make the overall query use a more modular approach to solve the problem. If I execute this new MDX statement in the MDX Sample Application, I get the answer that Figure 2 shows. Notice that Figure 2's results aren't the same as Figure 1's even though I used the same MDX functions to develop the queries. Which answer is correct?
Close examination reveals that Figure 2 definitely doesn't show the right answer. For one thing, Hermanos isn't a brand in the Alcoholic Beverages department. But even if you didn't know that Hermanos belongs in the Produce department, you'd likely notice that the Unit Sales values of the three brands listed as the top brands in the Alcoholic Beverages department (Hermanos, Tell Tale, and Ebony) total more than the amount for the whole Alcoholic Beverages department ($6838.00). These two incongruities prove that Figure 2 shows the wrong answer, but how can I find out whether Figure 1 shows the correct answer?
To answer this question and to understand how MDX executes this query and other complex queries, I developed a simple MDX debugging tool. This tool is an MDX UDF that uses the Windows MessageBox() function to display any string. The UDF lets you display on the screen intermediate results inside an MDX query while the query is executing. Listing 3 shows the UDF's source code, which I wrote in C#. You can download my complete C# code by entering InstantDoc ID 38005 at http://www.sqlmag.com.
It took me a while to figure out the steps for developing a UDF with C#. So if you haven't already developed an MDX UDF with a .NET language, here are the steps you need to follow:
Create a .NET-project type of class library.
Edit the line in the AssemblyInfo.cs file that contains the AssemblyVersion information so that it contains a hard-coded version number rather than an auto-generated version number. In my UDF, I used the following line:
[assembly: AssemblyVersion("1.0.0.0")]
.NET is picky about assembly version numbers, and without a constant version number, I couldn't get MDX to recognize my UDFs.
Open the Project-Properties dialog box and change the Register for COM Interop flag in the Build properties to TRUE. This change registers your .NET class library as a COM DLL, which is required for MDX UDFs.
Place a ClassInterface statement just before the start of the class definition, as Listing 3 shows. This statement tells Visual Studio how to expose the class to the COM interoperability layer.
Add a using System.Runtime.Interop Services statement at the start of your C# source file, as Listing 3 shows. The ClassInterface statement in Step 4 requires InteropServices.
When these steps are complete, you're ready to add methods to your class definition, compile them, and use them from MDX. For my UDF, I created a method called MsgBox() that displays on the screen a box containing a message and caption that I specified as the method's parameters. The method returns the message that it displays so that you can embed the method in the middle of an MDX query without altering the query results.
Compiling a C# project creates a DLL and a TLB file in the project's bin/Debug subdirectory. The TLB file is the COM type library that you need to register with Analysis Services to make your C# methods available for use. I used the following statement in the MDX Sample Application to register my type library. Note that dotNETUDFs is the name I chose for my C# project.
USE LIBRARY "C:Documents and SettingsrwhitneyMy DocumentsVisual Studio ProjectsdotNETUDFsbinDebugdotNETUDFs.tlb"
After the library is registered, you can immediately start using the C# methods. The query in Listing 4 shows the code I used to embed the C# MsgBox() method inside Listing 3's MDX query. MsgBox() requires and returns only string items, but the TOPCOUNT() function returns a set of members. To make the two functions compatible, I sandwiched the MsgBox() method between the MDX functions STRTOSET() and SETTOSTR() to convert the TOPCOUNT() set into a string and back to a set. Figure 3, page 40, shows the first message that the screen displays when you execute Listing 4's query.
In the C# MsgBox() method, notice that I use a counter variable to limit the number of times a message is displayed on the screen. This limit is helpful when the MsgBox() method is called hundreds or thousands of times in a query. I could also achieve the same result by using a Cancel button on the message box rather than a counter. When the counter in my example reaches its limit, I must call the Reset method to restore the counter to a nonzero value so that it once again displays messages. I used the following separate MDX query to call the Reset method:
WITH MEMBER Measures.Temp AS 'Reset(5)'SELECT { Temp } ON COLUMNSFROM Sales
Now I could use the MsgBox() method to figure out why the query in Listing 2 returned the wrong result. I altered Listing 2's query as Listing 5 shows. I used the MsgBox() method to display what the CURRENTMEMBER of the product dimension was when the [Brands Within Dept] set was evaluated. I learned that the [Brands Within Dept] set was evaluated only twice during the query execution instead of each time GENERATE() discovered a product department. Also, the CURRENTMEMBER was the All member (i.e., the topmost member) of the product dimension, not a product department. This means that Analysis Services evaluates and caches a WITH SET clause for the rest of the query execution. That's why Listing 2's query results were wrong.
By designing your MDX queries one part at a time, as I demonstrated in this example with pseudo code, you can tackle complex problems. Then, you can make sure the queries are operating correctly by displaying the results one part at a time. I hope you find this powerful two-part process useful for creating your own MDX.
About the Author
You May Also Like