Building Your First Cube
Get at your BI data more easily by entering the world of data cubes
November 17, 2009
If you've worked in any technology-related field, you've probably heard the term cube thrown around, but most traditional DBAs and database developers haven't worked with them. Cubes are powerful data constructs for rapidly aggregating multi-dimensional data. If your organization wants to perform data analysis on large volumes of data, a cube is the ideal solution.
What Is a Cube?
Relational databases were designed to support thousands of concurrent transactions while maintaining performance and data integrity. By their very design, relational databases fall short in large volume data aggregation and retrieval. To aggregate and return large volumes of data, a relational database must receive a set-based query that asks for a set of data to be aggregated on the fly. These relational queries are very costly due to their reliance on multiple joins and aggregations, so relational aggregation queries perform poorly when operating on large data sets.
Related: Build a Cube to Perform Scenario-Based Risk Analysis
Cubes are multidimensional entities that address this weakness in relational databases. With a cube, you can provide users with a data structure that facilitates rapid responsiveness for large-volume aggregation queries. Cubes perform this aggregation magic by pre-aggregating data (measures) across multiple dimensions. The cube's pre-aggregation (normally) takes place when a cube is being processed. When you process a cube, you're creating pre-calculated aggregations of data that are stored in binary form on disk.
A cube is the central data construct of an OLAP system such as SQL Server Analysis Services (SSAS). Cubes are (usually) constructed from an underlying relational database called a dimensional model, but they're separate technical entities. Logically, a cube is a data repository that is composed of dimensions and measures. Dimensions contain descriptive attributes and hierarchies while measures are the facts you are describing with dimensions. Measures are combined into logical groupings called measure groups. You tie dimensions to measure groups based on a granularity attribute.
In the file system, a cube is implemented as a series of related binary files. The binary architecture of a cube facilitates its fast retrieval of large volumes of multi-dimensional data.
I mentioned that cubes are constructed from an underlying relational database called a dimensional model. A dimensional model contains relational tables (fact and dimension) that correlate nicely to a cube's entities. Fact tables contain measurements such as the quantity of a product sold. Dimension tables store descriptive attributes such as product names, dates, and employee names. Generally, fact tables are related to dimension tables through primary-foreign key constraints (this relational join correlates to the cube's granularity attribute I spoke of earlier), with the foreign keys residing in the fact table. When dimension tables are directly related to a fact table, a star schema is formed. When dimension tables aren't directly related to a fact table, a snowflake schema is produced.
Note that dimensional models are categorized according to their scope. A data mart is a dimensional model designed for a single business process, such as sales or inventory. A data warehouse is a dimensional model designed to encompass multiple business processes, and thus facilitates cross-business process analytics.
Be Prepared: Software Requirements
Now that you have a basic understanding of what cubes are and why they're important, I'll switch gears and take you on a step-by-step tour of building your first cube using SSAS. There are some basic software components you'll need in place before building your first cube, so make sure your system meets these requirements before proceeding.
My sample Internet Sales cube will be built from the AdventureWorksDW 2005 sample database. I'll be building the sample cube from a subset of the tables found in the sample database that are useful for analyzing Internet sales data. Figure 1 shows these tables in a basic database diagram. Because I'm using the 2005, you can follow along with my directions using either SQL Server 2005 or SQL Server 2008. The AdventureWorksDW 2005 sample database can be found on the CodePlex website at msftdbprodsamples.codeplex.com.
As mentioned above, you need access to an instance of SQL Server 2008 or 2005, including the SSAS and Business Intelligence Development Studio (BIDS) components. I'll be using SQL Server 2008, so you might see a few subtle differences if you're using SQL Server 2005.
Creating the Analysis Services Project
The first thing you need to do is create an SSAS project using BIDS. You'll find BIDS on the Start menu under Microsoft SQL Server 2008, SQL Server Business Intelligence Development Studio. Clicking this link will launch BIDS with the default splash screen. You create a new SSAS project by selecting File, New, Project. You'll see the New Project dialog box, shown in Figure 2. Next, click the Analysis Services Project icon and name the project SQLMAG_MyFirstCube. Click OK.
Once the project has been created, right-click the project in Solution Explorer and select the Properties context menu option. Select the Deployment option in the left side of the Property Pages dialog box and review the Target Server and Database settings, as shown in Figure 3. If you're working in a distributed SQL Server deployment, you'll need to update the Target Server property with the name of the server to which you intend to deploy. Click OK when you're satisfied with the new Analysis Services project's deployment settings.
Defining the Data Source
The first object you need to create is a data source. The data source object provides schema and data used when building the downstream cube-related objects. To create a data source object in BIDS, use the Data Source Wizard. Launch the Data Source Wizard by right-clicking the Data Sources folder in Solution Explorer and selecting the New Data Source option. (You'll find that creating SSAS objects in BIDS has a consistent development pattern. First, a wizard guides you through the object creation process and common settings. After the wizard finishes, you open the resulting SSAS object in a designer and fine tune as needed.)
Once you're past the welcome screen, define a new data connection by clicking the New button. Create a new Native OLEDBSQL Server Native Client 10 connection pointing to your designated SQL Server (which hosts the sample database). You can use either Windows or SQL Server authentication, depending on your SQL Server environment. Click the Test Connection button to make sure you've defined a valid database connection, then OK.
Next is the impersonation information configuration, which, like the data connection, depends on how your SQL Server environment is configured. Impersonation is the security context SSAS relies on when processing its objects. If you're running a basic, single-server (or laptop) deployment, as I assume most readers are, you can simply select the Use the service account option, as shown in Figure 4. Click Next to complete the Data Source Wizard and use AWDW2005 for the data source's name. It's fine to use this method for testing, but in production environments it's not a best practice to use the service account for impersonation. A better practice is to designate domain accounts for the SSAS impersonation account.
Data Source View
With your data source defined, the next step in the process of building an SSAS cube is to create a Data Source View (DSV). A DSV is very helpful because it provides the capability to separate the schema your cube is expecting from that of the underlying database's schema. As a result, DSVs can be used to enhance the underlying relational schema for the purposes of building a cube. Some of the DSV's key capabilities for enhancing source schemas include named queries, logical table relationships, and named column calculations.
Go ahead and right-click the DSV folder and select the New Data Source View option to bring up the New Data Source View Wizard. In the Select a Data Source step, select the relational database connection you defined earlier and click Next. Select the FactInternetSales, DimProduct, DimTime, and DimCustomer tables and click the single arrow pointing to the right to move the tables over to the included column. Finally, click Next and finish the wizard, using the default name.
At this point you should have a DSV, which is located under the Data Source Views folder in Solution Explorer. Double-click the new DSV to display the DSV Designer. You should see all four tables in the DSV, as shown in Figure 5.
Creating the Database Dimensions
As I explained in the introduction, dimensions provide the descriptive attributes of measures and hierarchies that are used to provide non-leaf level aggregations. You should understand the difference between a database dimension and a cube dimension: database dimensions provide a base dimension object for multiple cube dimensions to be built upon.
Database and cube dimensions provide an elegant solution to a concept known as role-playing dimensions. Role-playing dimensions are when you need to use a single dimension multiple times in a cube. Date is a great example—in the sample cube, you'll be building a single date dimension and referencing it once for each date for which you want to analyze Internet sales.
Date will be the first dimension you create. Right-click the Dimensions folder in Solution Explorer and select the New Dimension option to launch the Dimension Wizard. Select the Use an existing table option and click Next in the Select Creation Method step. In the Specify Source Information step, select DimTime in the Main table drop-down and click the Next button. Now you need to create the Time dimension's attributes in the Select Dimension Attributes step. Select every column, as shown in Figure 6.
Click Next. In the Completing the Wizard step, type Dim Date in the Name textbox, and click the Finish button to complete the Dimension Wizard. You should now see a new Dim Date dimension located under the Dimensions folder in Solution Explorer.
Next you'll use the Dimension Wizard to create the Product and Customer dimensions. Use the same steps to create a basic dimension that you used before. When you're running through the Dimension Wizard, make sure you select all potential attributes in the Select Dimension Attributes step for both dimensions. The default values for all other settings will suffice for the sample cube.
Bringing It All Together: Building the Internet Sales Cube
With your database dimensions built, you're now ready to build the cube. In Solution Explorer, right-click the Cubes folder and select the New Cube option to launch the Cube Wizard. Select the Use existing tables option in the Select Creation Method screen. Select the FactInternetSales table for the Measure Group in the Select Measure Group Tables step. Remove the check next to the Promotion Key, Currency Key, Sales Territory Key, and Revision Number measures in the Select Measures step and click Next.
On the Select Existing Dimensions screen, make sure all existing database dimensions are checked to reuse them in the cube as cube dimensions. Because I want to keep this cube as simple as possible, uncheck the FactInternetSales dimension in the Select New Dimensions step. (By leaving the FactInternetSales dimension checked here, you'd be creating what's called a Fact dimension or degenerate dimension. Fact dimensions are dimensions that are created by using an underlying fact table as opposed to a traditional dimension table.)
Click the Next button to advance the wizard to the Completing the Wizard step and type "My First Cube" in the cube name textbox. Click Finish to complete the cube wizard process.
Deploying and Processing the Cube
You're now ready to deploy and process your first cube. Right-click the new cube in Solution Explorer and select the Process option. You'll now see a dialog box informing you that the server content appears to be out of date. Click Yes to deploy your new cube to the target SSAs server. When you deploy a cube, you're technically sending XML for Analysis (XMLA) to the target SSAS server, which creates the cube on the server. As mentioned earlier, processing a cube populates its binary files on disk with data from the underlying data source, including the additional dimensional metadata you've added (dimension, measure, and cube settings).
Once the deployment process is complete, a new Process Cube dialog box is displayed. Click the Run button to process the cube, and a Process Progress dialog box will be displayed. Once cube processing has completed, click the Close button (twice to close out both dialog boxes) to complete the cube's deployment and processing.
You've now built, deployed, and processed your first cube. You can browse your new cube by right-clicking the cube in Solution Explorer and clicking Browse. Drag and drop the measures in the center of the pivot table and the dimension attributes on the rows and columns to explore your new cube. Observe how fast the cube returns your various aggregation queries and think back to my earlier discussion about the weaknesses of relational database aggregation queries. You should now comprehend the raw power, and thus the business value, of an OLAP cube.
About the Author
You May Also Like