Avoid Common SQL Server Analysis Services ProblemsAvoid Common SQL Server Analysis Services Problems
Microsoft SQL Server Analysis Services (SSAS) 2008 and SSAS 2005 offer tools to help you develop multidimensional data structures for business intelligence (BI). With help from Business Intelligence Development Studio (BIDS), you can avoid the most common problems of working with SSAS cubes, such as dimension-design headaches, cube-design and processing gotchas, aggregation problems, and more.
November 20, 2008
SQL Server Analysis Services (SSAS) is the Microsoft cube-building engine that DBAs and SQL Server developers can use to create multidimensional data structures. Although SSAS produces powerful, flexible cubes, the complexity inherent in using SSAS to design dimensions, hierarchies, measure groups, and calculations can cause problems. We’ll explore some of the common mistakes SQL Server pros make in the design and implementation of their SSAS cubes and learn how to resolve them.
Related: SSAS for DBAs
Dimension Headaches
SSAS’s most troublesome area is dimension design. Part of this difficulty stems from the importance dimensions play in letting users slice and dice the data, coupled with the many properties on attributes, hierarchies, hierarchy levels, and attribute relationships.
Attributes and hierarchies. SSAS 2008 and SSAS 2005 let you treat each dimension attribute as a separate hierarchy, which enhances analysis but also produces slow queries returning vast amounts of data. Although some companies simply leave the attributes as standalone hierarchies, cube developers should build multilevel hierarchies in the dimensions whenever possible.
Natural hierarchies are those in which each level logically leads to the next, much as years break down into quarters, which break down into months, which break down into days. When creating hierarchies, you should make as many of them natural as possible, although it’s acceptable to define hierarchies that aren’t natural (i.e., user or reporting hierarchies). Users expect to be able to drill down or up when working with data. A guided drill path makes cubes more usable by simplifying the user experience.
Additionally, you should hide attributes included in multilevel hierarchies, as the example in Figure 1 shows, to avoid having users query against the attributes independently while performing analysis. The problem with treating some attributes as independent hierarchies is that users may unwittingly perform analysis on a low-level attribute, such as individual customers. The accompanying analysis could return hundreds of thousands or even millions of members and require several minutes to stream the data from server to client. Because of this performance impact, you should hide the hierarchies for any attributes included in multilevel hierarchies and hide any other attributes that are treated only as properties of other attributes. The attributes are visible but only as part of the hierarchy. The simple Customer dimension in Figure 1 has two hierarchies, Customers and Demographics, and the City attribute has AttributeHierarchyEnabled set to False, per Microsoft’s recommendation to hide attribute hierarchies included in a multilevel hierarchy.
Attribute relationships. After you build hierarchies, you should create attribute relationships (see “Creating Dimensions in SSAS, Part 2,” April 2008, for more information). These relationships tell the SSAS engine how attributes relate to each other to perform queries and build aggregations. For example, you can sum the days of a month into a monthly total. You can sum the values for three months into a quarterly total, and so forth. Without attribute relationships, however, the engine doesn’t know that days are related to months, months to quarters, and so on. Without attribute relationships, if the user asks for the total for a particular quarter, SSAS has to read the individual day records and sum them all to get quarter totals. With attribute relationships, SSAS can simply add three-month totals to reach the quarter total.
Creating proper attribute relationships benefits both processing and querying. Unfortunately, database professionals find creating attribute relationships to be one of the biggest SSAS challenges, in part because of the somewhat confusing manner in which Business Intelligence Development Studio (BIDS) 2005 defines the relationships. Fortunately, BIDS 2008 includes a graphical browser that makes it easy to not only see the attribute relationships but also to spot problems such as circular references and redundant relationships.
Figure 2 shows attribute relationships in BIDS 2005 created by dragging the parent attribute under the child attribute, and Figure 3 reveals the same attribute relationships with the new graphical viewer in BIDS 2008 (note that Calendar Year is cut off in the diagram), which provides a graphical designer that significantly reduces errors. In both cases, two hierarchies exist: one for Year to Quarter to Month to Day and one for Year to Quarter to Week to Day
Attribute keys. Attribute keys relate to attribute relationships and define what’s unique about a particular characteristic. Sometimes the keys require more than one column from the dimension table, much like a primary key in an OLTP table. For example, the month column in a time dimension might contain just the month numbers 1, 2, 3, and so on. However, the month number 1, by itself, doesn’t say enough about that attribute to make it unique because every year has a month 1. Therefore, the attribute key must be both the year and the month. Neglecting to set the key correctly can lead to either a failure during processing or successful processing with unpredictable results, depending on whether the attribute relationship is set to be rigid or flexible.
Cube Headaches
Although common problems occur in dimensions, cubes can cause headaches, too, in their design and processing. If you avoid the following common mistakes, you can make usable cubes and help speed processing times.
Measure groups, for example, contain measures that have the same dimensionality and granularity. Multiple measure groups with different characteristics can exist in one cube. One of the most common areas of confusion with SSAS 2008 and SSAS 2005 is the fact that measure groups default to repeating a value for all members of an unrelated dimension. For example, the AdventureWorks database includes products sold over the Internet and through a reseller channel. For Internet sales, AdventureWorks knows the end customer and operates without a reseller. For reseller sales, AdventureWorks sells directly to the end customer and therefore doesn’t use a reseller. Therefore, if a user looks at reseller sales and tries to view the customer, the same value repeats for each customer, as you see in Figure 4. This image shows two columns: Sales Amount and Sales Amount - Fact Reseller Sales, both for 2001. Along the left side is a breakdown of the customer dimension by country. The default setting is to ignore unrelated dimensions, which can cause a value to be repeated for every member of an attribute from a dimension unrelated to the measure group.
Fortunately, an easy way exists to change this default behavior, and adjusting it often prevents users from believing that the data they see represents valid values for those customers. The solution is a property on the measure group called IgnoreUnrelatedDimensions. You can find it in BIDS by opening the cube and clicking a measure group in the Measures pane. The default value for IgnoreUnrelatedDimensions is True, which means that the measure group ignores any dimension not tied to it by showing the current value for all members in the unrelated dimension. Changing IgnoreUnrelatedDimensions to False means that the measure group now sees that the query is asking for a value to be tied to a dimension to which the facts aren’t related, and the result is the blank value you see in Figure 5. This is a simple fix that can significantly reduce user confusion.
Aggregation Answers
A primary reason for developing cubes is to create pre-calculated aggregations. Aggregations are summarizations of data at any level higher than the grain, which is the lowest level of detail in the fact table. If you record sales per customer per product per day, the total sales per customer for a day or month or quarter would each represent different aggregations (e.g., total sales for Customer X by day or total sales for Customer X by quarter). In addition, the total sales for all customers for a day or month by product subcategory or product category also represent aggregations.
Considering the variety of attributes and levels of hierarchical information available, one of the most straightforward ways to enhance cube performance is to pre-calculate and store these aggregations. However, cubes built in BIDS don’t include any higher-level aggregations by default. Unfortunately, some businesses go through the entire deployment process without ever creating any aggregations. The Aggregation Wizard organizes aggregations based on either a percentage of total aggregation, storage size, or simply the point at which the designer tells the wizard to stop. The wizard’s approach is the easiest way to develop aggregations; however, the wizard assumes that all queries are equally likely.
One common approach to handling aggregations is to let the wizard create a low percentage of aggregations, such as 20 percent. Then, after the cube is running in production, you can log queries and later run through the Usage Based Optimization (UBO) wizard. The UBO wizard looks at the actual queries and designs aggregations that support them, meaning that aggregations become far more tuned to actual usage patterns. Note that for large cubes, the UBO in SSAS 2005 didn’t always provide the best aggregations; the SSAS 2008 UBO does a much better job.
Another problem you want to avoid is having too many aggregations. Queries can actually suffer as the engine searches through a host of aggregations to try to determine which is best at fulfilling a particular question. Some DBAs use the Aggregation Wizard and set it to 100 percent, which tends to slightly improve performance while dramatically increasing physical storage requirements and processing time. Additionally, some database pros fashion custom aggregations because the wizard, even at 100 percent, doesn’t create every possible aggregation. SSAS 2005 includes a sample application that provided for the tuning of aggregations, and a similar tool is built into BIDS 2008. Some database pros erroneously assume that if one aggregation is good, thousands must be better, and so they produce every possible aggregation. Not only does this require vast amounts of storage, but it can also stretch processing times into hours or even days.
Processing Options
Suppose you need to keep the cube available at all times, but it sometimes goes offline for processing. To minimize processing time, you can create attribute relationships and tune the number of aggregations to improve query times while still allowing for processing in the allotted time window.
Commonsense tips for solving the problem include reducing the number of attributes in the dimensions and partitioning the data. You can also tune specific processing settings. The Microsoft SQL Server 2005 Analysis Services Performance Guide covers many server settings you can tune, such as ThreadPoolProcessMaxThreads and OLAPProcessBufferRecordLimit. You can also examine a simple setting in BIDS and SQL Server Management Studio, which is available via the Process dialog box. A Change Settings button just above the Run button opens the Change Settings dialog box that Figure 6 shows. The default Processing Order is Parallel, and Maximum parallel tasks is set to Let the server decide. Changing this value can result in better (or worse) performance. Although a number of values populate the list, you can also manually enter a value, using a trial-and-error approach.
Data Source Login Hassles
Developers often create data sources that connect to the database with Windows authentication but set the impersonation information to a service account. Such an account might not have the appropriate permissions to access the data, especially when the project is moved from development to test to production. Worse, data connections are sometimes made with hard-coded usernames and passwords in them, which works well until the password changes. Normally this happens after midnight on a weekend, causing a developer to spend hours updating all the information.
The best practice for connecting to data sources is to use Windows authentication, then impersonate using a domain account created specifically for accessing the source data. This account should have read-only access to the data, except when using data-mining models or writeback.
Data-Delivery Snafus
Perhaps the most common problem with the use of SSAS isn’t a specific design issue involving the cubes but rather the inability to provide the proper tool mix needed to deliver data from the cubes to the organization. No single tool is right for everyone; typical solutions include at least a reporting tool (such as SQL Server Reporting Services) and an analytics tool (such as Microsoft Excel or Microsoft ProClarity Desktop Professional). Many solutions also include a scorecard tool (such as Microsoft Office PerformancePoint Server 2007). Organizations pursuing business intelligence projects should determine how to deliver data and to whom it should go before designing the relational data warehouse and cubes. Knowing how you deliver data may drive some cube design decisions involving perspectives, key performance indicators, and more.
Cubing for Keeps
I’ve seen the problems detailed in this article regularly in a variety of companies with varying levels of experience in creating cubes with SSAS. It’s clear that dimensions cause the greatest number of issues—failing to create attribute relationships correctly, neglecting to hide attributes used in hierarchies, and improperly setting attribute keys. Cubes also lead to hassles when you relate measure groups and dimensions, deal with unrelated dimensions, and create aggregations. Finally, many IT organizations don’t optimize settings for processing and query performance and don’t provide end users with adequate tools to let their enterprises take full advantage of cube data. If you follow the recommendations in this article, you have a better chance of creating a usable and high-performance set of cubes.
Aggregation Answers
A primary reason for developing cubes is to create pre-calculated aggregations. Aggregations are summarizations of data at any level higher than the grain, which is the lowest level of detail in the fact table. If you record sales per customer per product per day, the total sales per customer for a day or month or quarter would each represent different aggregations (e.g., total sales for Customer X by day or total sales for Customer X by quarter). In addition, the total sales for all customers for a day or month by product subcategory or product category also represent aggregations.
Considering the variety of attributes and levels of hierarchical information available, one of the most straightforward ways to enhance cube performance is to pre-calculate and store these aggregations. However, cubes built in BIDS don’t include any higher-level aggregations by default. Unfortunately, some businesses go through the entire deployment process without ever creating any aggregations. The Aggregation Wizard organizes aggregations based on either a percentage of total aggregation, storage size, or simply the point at which the designer tells the wizard to stop. The wizard’s approach is the easiest way to develop aggregations; however, the wizard assumes that all queries are equally likely.
One common approach to handling aggregations is to let the wizard create a low percentage of aggregations, such as 20 percent. Then, after the cube is running in production, you can log queries and later run through the Usage Based Optimization (UBO) wizard. The UBO wizard looks at the actual queries and designs aggregations that support them, meaning that aggregations become far more tuned to actual usage patterns. Note that for large cubes, the UBO in SSAS 2005 didn’t always provide the best aggregations; the SSAS 2008 UBO does a much better job.
Another problem you want to avoid is having too many aggregations. Queries can actually suffer as the engine searches through a host of aggregations to try to determine which is best at fulfilling a particular question. Some DBAs use the Aggregation Wizard and set it to 100 percent, which tends to slightly improve performance while dramatically increasing physical storage requirements and processing time. Additionally, some database pros fashion custom aggregations because the wizard, even at 100 percent, doesn’t create every possible aggregation. SSAS 2005 includes a sample application that provided for the tuning of aggregations, and a similar tool is built into BIDS 2008. Some database pros erroneously assume that if one aggregation is good, thousands must be better, and so they produce every possible aggregation. Not only does this require vast amounts of storage, but it can also stretch processing times into hours or even days.
Processing Options
Suppose you need to keep the cube available at all times, but it sometimes goes offline for processing. To minimize processing time, you can create attribute relationships and tune the number of aggregations to improve query times while still allowing for processing in the allotted time window.
Commonsense tips for solving the problem include reducing the number of attributes in the dimensions and partitioning the data. You can also tune specific processing settings. The Microsoft SQL Server 2005 Analysis Services Performance Guide covers many server settings you can tune, such as ThreadPoolProcessMaxThreads and OLAPProcessBufferRecordLimit. You can also examine a simple setting in BIDS and SQL Server Management Studio, which is available via the Process dialog box. A Change Settings button just above the Run button opens the Change Settings dialog box that Figure 6 shows. The default Processing Order is Parallel, and Maximum parallel tasks is set to Let the server decide. Changing this value can result in better (or worse) performance. Although a number of values populate the list, you can also manually enter a value, using a trial-and-error approach.
Data Source Login Hassles
Developers often create data sources that connect to the database with Windows authentication but set the impersonation information to a service account. Such an account might not have the appropriate permissions to access the data, especially when the project is moved from development to test to production. Worse, data connections are sometimes made with hard-coded usernames and passwords in them, which works well until the password changes. Normally this happens after midnight on a weekend, causing a developer to spend hours updating all the information.
The best practice for connecting to data sources is to use Windows authentication, then impersonate using a domain account created specifically for accessing the source data. This account should have read-only access to the data, except when using data-mining models or writeback.
Data-Delivery Snafus
Perhaps the most common problem with the use of SSAS isn’t a specific design issue involving the cubes but rather the inability to provide the proper tool mix needed to deliver data from the cubes to the organization. No single tool is right for everyone; typical solutions include at least a reporting tool (such as SQL Server Reporting Services) and an analytics tool (such as Microsoft Excel or Microsoft ProClarity Desktop Professional). Many solutions also include a scorecard tool (such as Microsoft Office PerformancePoint Server 2007). Organizations pursuing business intelligence projects should determine how to deliver data and to whom it should go before designing the relational data warehouse and cubes. Knowing how you deliver data may drive some cube design decisions involving perspectives, key performance indicators, and more.
Cubing for Keeps
I’ve seen the problems detailed in this article regularly in a variety of companies with varying levels of experience in creating cubes with SSAS. It’s clear that dimensions cause the greatest number of issues—failing to create attribute relationships correctly, neglecting to hide attributes used in hierarchies, and improperly setting attribute keys. Cubes also lead to hassles when you relate measure groups and dimensions, deal with unrelated dimensions, and create aggregations. Finally, many IT organizations don’t optimize settings for processing and query performance and don’t provide end users with adequate tools to let their enterprises take full advantage of cube data. If you follow the recommendations in this article, you have a better chance of creating a usable and high-performance set of cubes.
Learn more: Performance Secrets for SQL Server Developers
About the Author
You May Also Like