Improving Analysis Services Query Performance
Proper cube partitioning speeds query response
January 19, 2004
Analysis Services is a high-performance, multidimensional query engine for processing analytical and statistical queries, which a relational SQL engine doesn't handle well. When such queries are simple or have pre-aggregations, Analysis Services can make your job easier. But when queries become complex, Analysis Services can bog down. For example, an SQL SELECT statement that includes a GROUP BY clause and aggregate functions can take as long as a few minutes—or more. You can retrieve the same result set in just a few seconds if you execute an MDX statement against an Analysis Services Multidimensional OLAP (MOLAP) cube. You perform this workaround by passing an MDX query from SQL Server to a linked Analysis Server by using the OPENQUERY function in an SQL SELECT statement, as SQL Server Books Online (BOL) describes. Analysis Services then precalculates the necessary aggregations during the processing and creation of the MOLAP cube so that the results are completely or partially available before a user asks for them.
However, precalculating every imaginable aggregation is impossible; even a completely processed MOLAP cube can't precalculate aggregations such as those in calculated cells, calculated members, custom rollup formulas, custom member formulas, FILTER statements, and ORDER statements. If you're used to the performance you get when you retrieve only precalculated aggregations, the performance you get from an MDX query that involves these kinds of runtime calculations might seem unbearably slow. The problem might occur not because Analysis Services can't handle runtime calculations efficiently but because your MOLAP cube's design isn't optimal.
In my work building and maintaining a data warehouse for the city of Portland, Oregon, I optimize Analysis Services so that traffic engineers can quickly access a variety of statistics about traffic accidents in the city. Through many experiments, I've discovered that an important key to MOLAP optimization is cube partitioning. In this article, I explore and compare various MOLAP cube-partitioning strategies and their effects on query performance. Then, I suggest some simple guidelines for partition design.
Traffic-Accident Data Warehouse
My study of query performance is based on my work with a real dimensional data warehouse that maintains traffic-accident history. When I conducted this study, the traffic-accident data warehouse contained 17 years of data (1985 through 2001) and documented about 250,000 unique incidents. The complex part of this data warehouse is not its relatively small fact table but its many dimensions, which the snowflake schema in Figure 1 shows. Portland's traffic engineers look for the street intersections that have the highest number of incidents. Then, they search for clues about which factors might cause a high number of crashes and what makes some accidents more severe than others. They look at a total of 14 factors (which are the data warehouse's dimensions) including time, light, weather, traffic control, vehicle, and severity of occupant injuries. Among the dimensions, the Streets dimension (STREET_DIM) is the largest; it records roughly 30,000 street intersections in the Portland area. The total number of source records to build aggregations on is the result of a multi-way join of 14 one-to-many (1:M) or many-to-many (M:N) relationships from the fact table to the dimension tables. The Accident data warehouse contains only one measure: the distinct accident count (Incident_Count). A distinct count prevents the possibility of counting the same accident multiple times in a M:N relationship.
Fortunately, the Streets dimension isn't too large to use MOLAP cube storage, which provides the best query performance. Analysis Services defines a huge dimension as one that contains more than approximately 10 million members. Analysis Services supports huge dimensions only with Hybrid OLAP (HOLAP) or Relational OLAP (ROLAP) cubes.
Queries and Bottlenecks
Analysis Services responds to queries with varying performance, depending on the complexity of the query. For example, a MOLAP cube that you create by accepting the default partition in Analysis Manager would respond to a simple query like the one that Listing 1 shows by returning roughly 2000 records in only 5 seconds.
If your queries basically ask only for pre-aggregates in a few records or columns, any MOLAP cube with any percentage of aggregation—even as little as 5 percent—will perform well. However, for a query like the one that Listing 2 shows, which involves six calculated members, a 30 percentaggregated, single-partition MOLAP cube would take 52 seconds to return just 331 street intersections. These disparate results suggest that performance bottlenecks don't depend on the size of the result set or on the percentage of aggregation in the cube. In fact, in my experience, any aggregations beyond 30 percent are a waste—you get no better performance for your effort. For simple queries, you don't need high aggregation. For complex queries, high aggregation won't help. Performance bottlenecks in Analysis Services typically come from calculated members that scan for multiple tuples and aggregate them on the fly.
The City of Portland traffic engineers I work with typically ask ad hoc questions that are nonhierarchical along the Time dimension. For example, an engineer might ask me to calculate the total number of accidents during the past 3 years, the past 5 years, or any combination of years between 1985 and 2001. I can't simply aggregate years by creating a new level above the Year level in the Time dimension; the new level would satisfy only one combination of years. This limitation means all queries that involve a combination of years have to use calculated members to perform aggregations for the specified years.
Listing 2's query returns accident counts along the Time, Occupant_Severity, and Streets dimension members. Figure 2 shows the members of the Time and Occupant_Severity dimensions. Listing 2's query uses six calculated members—Accident_Count, Fatal, Injury_A, Injury_B, Injury_C, and PDO (Property Damage Only)—to sum the accidents in the years 1998, 1999, and 2000 for each of the five members of the Occupant_ Severity dimension. The query asks for a sorted and filtered result set of accident counts for each street intersection ([Street].[Street_List]) in each of these six calculated members. To contrast with the performance of such on-the-fly aggregation, I've included Listing 3, page 30, which accesses only pre-aggregations and doesn't include calculated members. I used Listing 2 and Listing 3 as the benchmarks for my cube partitioning tests, which I discuss in a moment.
When you need to improve the performance of queries that involve calculated members, cube design is important. In my experience, the most important aspect of cube design isn't how much memory you have, the number of disks or CPU threads you have, whether you use unique integers for member keys, or even whether you use the Usage-Based Optimization Wizard, but how you partition the cube.
Partitioning is slicing a cube along a tuple such as ([Occupant_Severity].[Fatal], [Time].[2000]), which specifies a member from each dimension. For any dimension that you don't specify in this tuple, the partition includes the entire dimension. Analysis Services keeps in the cube structure a direct pointer or index to the partition for that tuple. Whenever a query references that tuple or a subset of it, Analysis Services can get to the corresponding partition without scanning the entire cube. You can partition a cube in a nearly infinite number of ways, and Analysis Services supports as many partitions as you practically need for a cube. But without a clear rule for creating partitions, you could create too many cuts or wrong cuts on a cube and end up with worse performance than you'd get with one default partition.
Usage-Based Partitioning
You can partition a cube along any tuple of members at any level from any dimension. Analysis Services' Partition Wizard calls such a tuple a data slice. Although Analysis Services can scan a small partition faster than a large one, a small partition contains fewer members. Analysis Services might have to perform more scans of multiple small partitions to cover the same number of members that one larger partition could contain. So the overhead of performing calculations on the results of multiple partitions might negate the advantage of the faster scan in each smaller partition.
How you partition a cube depends on the queries you need to process. Logically, you might decide to partition along every tuple that a query specifies. For example, to improve Listing 2's performance, you might be tempted to partition along each tuple of the cross join of {Time.[1998], Time.[1999], Time.[2000]}, [Occupant_Severity].Members (6 members), and [Street].[Street_List].Members (roughly 30,000 members). You'd create partitions along a total of 540,000 tuples (3 x 6 x 30,000 = 540,000). This seemingly simple plan creates two problems. First, scanning 540,000 partitions and summing the 3 years for each tuple of severity and street (a total of 180,000 tuples) would create significant performance overhead. Second, the amount of work and time to create and process 540,000 partitions, manually or programmatically by using Decision Support Objects (DSO), is astronomical.
The excessive performance overhead you create when you partition along every tuple in a query is a serious concern for a couple of reasons. First, the query in Listing 2 isn't supposed to return each year individually. Instead, the query should return only the sum of incidents in 3 years. An efficient partition would include the three specified years so that Analysis Services could calculate the sum solely within the partition. Second, the query doesn't need to access just one street intersection; it has to scan all the street intersections regardless of the partitions you create. Being able to get to a particular street partition directly doesn't improve performance because you still have to walk through every partition. You'd be better off keeping all the street intersections in the same partition. The bottom line is that you should partition along a tuple only when the partition can save your query from doing a sequential scan for that tuple.
Partition Testing
To see what kinds of partitions avoid a sequential scan, I devised tests that use Listing 2 and Listing 3 as benchmarks. For complete details about my test environment, process, and results, see the Web-exclusive sidebar "Cube-Partitioning Tests" at http://www.sqlmag.com, InstantDoc ID 41231. In the rest of this article, I summarize the tests and some important results.
I created six cubes of identical structure with 30 percent aggregation and varying partition designs. I wanted to partition along the Time and Occupant_Severity dimension members (which Figure 2 shows) that the test queries in Listing 2 and Listing 3 are looking for so that they can get to those members with no scan or a faster scan. Table 1 describes the partitioning data slices of these six test cubes. I gave the cubes names that reflect their partitioning dimensions and total number of partitions.
To study the effect of the number and speed of CPUs, disk I/O, and physical memory on partitioned cubes, I repeated the same tests on six different Dell servers. Table 2 shows the specifications for these servers, ranging from the highest end to the lowest end in hardware resources. High1, High2, and High3 are high-end production-scale servers; Low1 and Low2 are desktops; and Low3 is a laptop (which I used as a server for the sake of testing). Each test executes Listing 2 and Listing 3 from the client machine Low2 against every test cube on all six servers.
All the tests measured the response times of Listing 2 and Listing 3. (To read the results of my tests for Listing 3, see "Cube- Partitioning Tests.") Figure 3 shows Listing 2's performance on all the servers. I drew the following conclusions for Listing 2:
High-end servers (with multiple low-speed CPUs) performed worse than low-end servers (with one high-speed CPU) regardless of cube partitioning. CPU speed—rather than the number of CPUs, disk speed, or amount of memory—drives performance.
Effective partitioning makes the query perform 5 to 10 times faster than on the default partition, especially on slower CPUs.
Queries that have calculated members, such as the one in Listing 2, are CPU-bound.
Partitioning along queried data slices, as I did in the Year_Severity31 and PartitionYear_Severity7 test cubes, gives the best performance.
Slicing along queried members (e.g., slicing along the six members of the Severity dimension and the three members of the Year level of the Time dimension) prevented sequential scans.
Minimizing partition sizes by excluding members that you don't query frequently (e.g., [Partition Year].[1], which includes the years 1985 through 1996) doesn't prevent a sequential scan but does speed up the scan.
Test results (which you can read about in "Cube-Partitioning Tests") show that an aggregation level higher than 5 percent has no effect on performance, which proves my hypothesis that high aggregation levels are a waste of effort.
Guidelines for Partitioning
Based on the results of my tests and the conclusions I've drawn, I offer these partition- design guidelines. For all queries:
Never overlap partitions.
Never specify the [All] member as the partition data slice because doing so creates overlapping partitions.
For queries like the one in Listing 3, which accesses only pre-aggregations:
No partitioning is necessary because its effect is negligible or negative.
Apply Analysis Services' Usage-Based Optimization.
For queries like the one in Listing 2, which calculates many aggregations on the fly:
Partition along queried data slices—for example, ([Partition Year].[2]
.[1997], [Fatal]).No Usage-Based Optimization is necessary because it has no effect.
Five percent aggregation is the maximum aggregation level that provides performance improvements.
If you have multiple slow queries that have different partitioning needs, consider creating different cubes for each query. For desktop ad hoc users who can retrieve just one screen of results at a time, using multiple cubes might be inconvenient. However, for custom applications (such as Web and reporting applications) that need complete results, you have the full control of accessing multiple cubes or even multiple databases behind the scenes for the best performance.
The term "tuning" implies that you'll have to experiment to achieve the optimal performance for your system. The techniques and guidelines that this article offers won't necessarily create optimal performance right away, but if you take the time to examine your query usage and identify the slow queries, estimate which partitions might prevent sequential scans, and test those partitions, you'll get closer to the performance you want.
About the Author
You May Also Like