XML for Analysis: Marrying OLAP and Web Services
Move your analytical applications to a flexible, Web-based architecture
October 18, 2004
XML for Analysis (XMLA)—a Web-service standard proposed and supported by Microsoft and leading OLAP companies—brings together Web services and OLAP technologies by providing an XML schema for OLAP and data-mining applications. Essentially, XMLA lets you explore and query multidimensional data through Web services, which means analytical applications can move away from their expensive and difficult-to-maintain client/ server roots toward a more flexible, Web-based architecture.
XML Web services architectures connect applications and components by using standard Internet protocols such as HTTP, XML, and Simple Object Access Protocol (SOAP). These architectures offer the promise of interoperable distributed applications that can be shared between and within enterprises. Amazon.com, for example, uses Web services to support associate programs that let third parties sell from its catalog, and Microsoft's MapPoint Web service integrates location-based services into a variety of applications. Web services are becoming crucial pieces of enterprise application architecture by letting you loosely couple services from disparate applications in a way that's easy to maintain as business processes change.
The XMLA specification, available at http://www.xmla.org, describes the following design goals:
Provide to remote data-access providers a standard data-access API that application developers can use universally across the Internet or a corporate intranet to access multidimensional data.
Optimize a stateless architecture that requires no client components for the Web and minimal round-trips between client and server.
Support technologically independent implementations of XMLA providers that work with any tool, programming language, technology, hardware platform, or device.
Build on open Internet standards such as SOAP, XML, and HTTP.
Leverage and reuse successful OLE DB design concepts so that application developers can easily enable OLE DB for OLAP applications and OLE DB providers for XMLA.
Work efficiently with standard data sources such as relational OLAP databases and data-mining applications.
By fulfilling these design goals, XMLA provides an open, industry-standard way to access multidimensional data from many different sources through Web services—with support from multiple vendors.
XMLA is based on SOAP, and you can use it from any application-programming language that can call SOAP methods, such as Visual Basic .NET, Perl, or Java. SOAP is a lightweight, XML-based protocol for exchanging structured and type information over the Web. Structured information contains content and an indication of what that content means. For example, a SOAP message might have an XML tag in it called CustomerName that contains customer name information. A SOAP message is an XML document that consists of a SOAP envelope (the root XML element that provides a container for the message), an optional SOAP header containing application-specific information (e.g., custom-authentication information), and a SOAP body, which contains the message you're sending. Calling SOAP methods is simply a matter of wrapping the arguments for the SOAP method in XML and sending the request to the server. Because SOAP's overall goal is simplicity, the protocol is modular and easy to extend to new types of applications that can benefit from Web services. You can use Internet standards to integrate SOAP with your existing systems. Most mainstream development platforms offer some support for calling SOAP-based Web services. Both Java 2 Enterprise Edition (J2EE) and the Microsoft .NET Framework have strong support for Web services, making the invocation of remote services almost transparent to the developer.
Besides working with XMLA directly, you can use the Microsoft .NET-based ADO MD.NET library to build .NET applications that use XMLA. ADO MD.NET is the successor to the OLE DB for OLAP—based ADO MD. However, I don't cover ADO MD.NET in this article. Instead, I show you how to use the underlying XMLA protocol to build an analytic application on any device or platform or in any language that supports XML. I assume you have some knowledge of OLAP fundamentals, at least a passing familiarity with MDX, and some exposure to XML. For an introduction to XML Web services, see Roger Wolter's Microsoft article "XML Web Services Basics' at http://msdn.microsoft.com/library/en-us/dnwebsrv/html/webservbasics.asp. You'll find an even more basic and technology-neutral introduction in Venu Vasudevan's Web services article "A Web Service Primer" at http://webservices.xml.com/pub/a/ws/2001/04/04/webservices/index.html.
Installing XMLA
To use XMLA with SQL Server 2000, download the XML for Analysis Software Development Kit (SDK), available at http://www.microsoft.com/downloads/details.aspx?familyid=7564a3fd-4729-4b09-9ee7-5e71140186ee&displaylang=en, and install it on a Web server that can access your Analysis Services data source through OLE DB for OLAP. (You can simply use the server that has Analysis Services installed on it.) SQL Server 2005 Analysis Services will support XMLA as a native protocol, so you won't have to separately install XMLA. But for now, this step is necessary.
Installing the SDK is straightforward, but to run the installer, you must be logged on as an Administrator to the machine on which you're performing the installation. When you double-click the XMLADSK.msi installation package, the installer walks you through the process. Unless you have a Secure Sockets Layer (SSL) certificate configured on your Web server, you need to select Enable HTTP and HTTPS during the Connection Encryption Settings step to allow your SQL Server unsecured communication with the XMLA Provider through HTTP. Note that using the XMLA Provider in unsecured mode isn't a good idea for a production system because the provider will pass your data across the network in plain text for anyone to intercept. But for just learning about XMLA in a non-production environment, you're probably OK using unsecured communication.
After installing the SDK, you need to set up the data sources that you're going to connect to through XMLA and make the server available to clients by creating a virtual directory for the XMLA Provider. To set up the data sources, you edit the datasources.xml file in the Config subfolder of the installation folder you selected when installing the provider. The default path for installation is C:Program FilesMicrosoft XML for Analysis SDK. The datasources.xml file contains a preconfigured example connection for the Local Analysis Server that you can copy to set up your own data sources. Figure 1 shows part of the datasources.xml file. The most important parts of this file are the required elements that facilitate the connection to the OLAP data source: DataSourceName for naming the data source; DataSourceDescription for adding a text description of the data source; URL, which provides the URL for the XMLA Provider; DataSourceInfo, which describes the OLE DB for OLAP connection to the Analysis Servers; ProviderType, which enumerates the type or types of provider being referenced—tabular data provider (TDP), multidimensional data provider (MDP), data-mining provider (DMP); and AuthenticationMode (Unauthenticated, Authenticated, or Integrated), which describes how the Web service will authenticate connections to the provider. The XML for Analysis Help file (which you installed with the SDK at Microsoft XML for Analysis SDKHelp1033smla11.chm) contains complete information about all these configuration options.
Once you've set up the data sources, you need to create in Microsoft IIS a virtual directory for the XMLA Provider. The virtual directory lets IIS access a specific folder on the server through HTTP, which is how we'll connect to the XMLA Provider for this example. The easiest way to set up a virtual directory is to open the IIS Manager, select the server on which you want to create the virtual directory, right-click the Web site you want to use for the XMLA Provider, and select New, Virtual Directory. The Virtual Directory Creation Wizard then guides you through the rest of the process. The first step is to name the virtual directory; XMLA is usually a good choice. Next, you select the content directory, which lets IIS map files in that directory to HTTP requests. For the XMLA Provider, the content directory is the path to the Msxisapi.dll file installed in the C:Program FilesMicrosoft XML For Analysis SDKIsapi folder (the default location) during setup. Then, set the access permissions for this folder by selecting the Read, Run Scripts, and Execute check boxes, and finish the wizard.
After you configure the virtual directory, you set access permissions on it. In IIS Manager, right-click the virtual directory you just created and select Properties. In the Properties window, select the Directory Security tab and configure the security permissions. For learning about how XMLA works, the default permissions setting (anonymous access) is sufficient.
If you're configuring the XMLA Provider on Windows Server 2003, you must take some additional steps to enable the protocol on the server. The XMLA Help topic "Enable the XML for Analysis Web Service Extension on Windows Server 2003" tells you how to get the XMLA Provider to work on Windows Server 2003.
Using XMLA: Discover and Execute
One of XMLA's greatest strengths is that it simplifies data retrieval compared to working directly with OLE DB for OLAP. The XMLA Provider has only two methods: Discover and Execute. You use the Discover method to retrieve metadata that describes the services a specific XMLA Provider supports. You use the Execute method to run queries against the Analysis Services database and return data from those queries.
Discover. Discover is a flexible method that a client can use repeatedly to build a picture of the configuration and capabilities of the data provider. So, for example, a client might first request the list of data sources that are available on a particular server, then inquire about the properties and schemas those data sources support so that a developer can properly write queries against the data source. Let's look at the arguments you send to Discover, then walk through some examples that show how to use the method.
Listing 1's XML code shows a SOAP call to retrieve a list of data sources from the server. The first parameter, RequestType, determines the type of information that Discover will return about the provider. The available types let you get a list of the data sources available on the server (DISCOVER_DATASOURCES), a list of properties about a specific data source on the server (DISCOVER_PROPERTIES), a list of supported request types (DISCOVER_SCHEMA_ROWSETS), a list of the keywords the provider supports (DISCOVER_KEYWORDS), and a schema rowset constant to retrieve the schema of a provider-defined data type. Table 1 lists the RequestType parameters.
The second parameter, Restrictions, lets you put conditions on the data that Discover returns. The RequestType in the call to the Discover method determines the fields that the Restrictions parameter can filter on. Table 2 describes the fields that the various schema types in XMLA can use to restrict returned information. If you want to return all the data available for a given RequestType, leave the Restrictions parameter empty.
The Properties parameter provides additional information about the request that the other parameters don't contain. For example, Timeout specifies the number of seconds the provider will wait for the Discover request to succeed before returning a timeout message. Table 3 lists some common XMLA Provider for Analysis Services properties you're likely to use. You can specify properties in any order. If you don't specify a Properties value, Discover uses the appropriate default value.
The Discover method call in Listing 1 returns results in XML. The settings you give the parameters RequestType, Restrictions, and Properties determine the contents of Result, which is an output parameter. In Listing 1, note that I set RequestType to DISCOVER_DATASOURCES and Restrictions and Properties to null so that Discover returns the entire list of data sources in the default format (tabular format in this case). To call a SOAP method, you have to send the SOAP envelope to the Web service through HTTP. I've provided a sample Web application, which you can download at InstantDoc ID 44006. The sample application shows exactly how you might send a SOAP envelope in JScript by using the Microsoft.XMLHTTP object in the SubmitForm() method. The sample also shows you more examples of how to use the Discover method and how to use the data-source information retrieved from the first call to Discover to populate the next call to Discover.
Execute. After you use Discover to determine the metadata for the data source, you can use that metadata to retrieve data. For data retrieval, XMLA provides the Execute method. The method call for Execute looks like this:
Execute (Command,Properties,Results)
As Listing 2's SOAP call to Execute shows, the Command parameter contains in a tag the MDX statement you want to run against your OLAP server. Similar to the Properties parameter in the Discover method, the Properties parameter in Execute provides additional information that controls the data the method returns or the connection to the data source. You must include the Properties tag in your Execute method call, but the tag can be empty if you want to use the defaults for your request. The Results parameter represents the SOAP document the server returns. Results' contents are determined by the other two parameters.
Listing 2's code shows an example of a call to Execute that contains an MDX SELECT statement. You call the Execute method the same way you call the Discover method, by sending the SOAP envelope to the Web service through HTTP. As with any SOAP request, the entire message is contained in a SOAP envelope. Within the SOAP envelope, the SOAP body contains the guts of the Execute method call, starting with the Command parameter. The Command parameter contains the MDX query that will run on the server. The Properties parameter comes next, containing the PropertyList parameter that holds each of the properties the XML code will use for the Execute request. In this case, the Execute call specifies in the PropertyList parameter DataSourceInfo, Catalog, Format, and AxisFormat. You can retrieve all this information in a call to Discover like the one that Listing 1 shows. Finally, you close the body and envelope, and the request is ready to send via HTTP to the XMLA Provider.
Getting Results
When the XMLA Provider receives a request, it passes the request to the MDX query engine, which parses and executes it. After obtaining the MDX results, the XMLA Provider packages them into a SOAP reply and sends them back to the requesting client. An Execute response can be quite long depending on the amount of data returned and the format used. To see the results of an Execute query, load the sample application and run an MDX query. To load the sample application, simply open it in Internet Explorer (IE). You can either copy the file to a virtual directory and open it in HTTP or double-click the file to open it in the browser. You'll see all the XML that the query returned in the sample Web application; Figure 2 shows part of the results.
The SOAP response from a call to an Execute method looks similar to the results from a call to Discover. As Listing 2 shows, the calling code includes the usual SOAP Envelope and Body tags as the top-level wrappers, then shows the MDX query packaged for transmission in XML. You have two options for the format of an Execute request's results: Rowset and MDDataSet (which appears as Multidimensional in the listing). The Rowset format is a flattened tabular structure that contains rows and columns along with the data elements. MDDataSet is a multidimensional format that contains three sections: OLAPInfo, Axes, and CellData. You'll see these three sections if you scroll through the results of the sample application. The multidimensional format represents the multidimensional data in a hierarchical format that's more representative of the structure of the data than the flattened tabular format. OLAPInfo defines the structure of the results. The first section of OLAPInfo, CubeInfo, lists the cubes where the data originated. Next, AxesInfo has an AxisInfo element for each axis in the data. Every AxisInfo element contains the hierarchies, members, and properties for that axis. AxisInfo always contains the standard properties Uname (Unique Name), Caption, Lname (Level Name), and Lnum (Level Number). In addition, AxisInfo might contain a default value specified for cell properties. If the query results include many repeating values, these default values can dramatically reduce the size of the returned data by returning only the data elements that are different from the default. Last, the CellData section of a multidimensional format contains CellInfo standard and custom properties for each cell the MDX query returns. The standard properties are Value, FmtValue (Format Value), ForeColor, and BackColor. Optional properties depend on the MDX query you use to retrieve the results.
Describing XMLA results in abstract terms is difficult because the exact data returned varies depending on the query you use. The easiest way to understand OLAPInfo is to walk through an example of the results from a specific query. Consider the following MDX query:
select{[Product].children} on rows,{[Store].children} on columnsfrom Sales
Running this query through the XMLA Provider by using the Execute method results in the AxesInfo section that Figure 3 shows. The query returns columns (Axis0) and rows (Axis1). Each axis contains only one hierarchy: The columns axis contains the Store hierarchy, and the rows axis contains the Product hierarchy. After defining the dimensional axes, Figure 3 shows the slicer dimension, which is an MDX dimension for filtering multidimensional data. Slicer dimensions appear in the WHERE clause of an MDX query and display every hierarchy in the cube that doesn't appear in the dimensional axes. The repetition of this information is useful in XMLA because you can use the information to show which other hierarchies are available in a given cube and write further queries against those hierarchies.
As I noted earlier, the last part of the OLAPInfo section of a multidimensional format, CellInfo, describes the properties the query will return for each cell in the result set. Because the query I use in this example doesn't specify any additional properties, the CellInfo section displays only the basic Value and FmtValue information:
— the AxesInfo goes here — >
The next section of the results in MDDataSet format is Axes, which contains the data the query returns organized in either TupleFormat, as Figure 4 shows, or Cluster-Format. Let's look at an example to see the differences between these two formats. Say you have three country categories (Canada, Mexico, and USA) and three product categories (Drink, Food, and Non-Consumable), which produce nine combinations of countries and products. Logically, you have several options for representing this set in a written notation. First, you can simply list the combinations:
{(Canada, Drink), (Canada, Food), (Canada, Non-Consumable),(Mexico, Drink), (Mexico, Food), (Mexico, Non-Consumable),(USA, Drink), (USA, Food), (USA, Non-Consumable)}
This is the kind of set representation that the TupleFormat uses. Each pair is a tuple, and each tuple contains a member from each dimension you included in the results. So if you had three dimensions in the query, the resulting tuple would have three members.
Alternatively, you can use a mathematical representation of the combinations of the two sets. Using the concept of a Cartesian product, you can represent the set of data as:
{Canada, Mexico, USA} x {Drink, Food, Non-Consumable}
The Cartesian product operator (x) between the two sets represents the set of all possible combinations of the two sets. The ClusterFormat uses this representation. And although this is a much more compact representation, it requires more interpretation to understand and navigate.
The last section in MDDataSet is CellData, which contains values for each cell the MDX query returns. An ordinal number in a zero-based array refers to the cells. (To learn how to calculate ordinal numbers, see the Web sidebar "Mapping the Tuple Ordinals" at InstantDoc ID 44007.) If a cell isn't present in the array, the default value from AxisInfo serves as the value for the cell. If no default value is specified, the value is null.
A Convenient Marriage
This article has introduced XMLA as a Web services layer that uses SOAP to tap into OLAP data. XMLA provides the basis for standards-based, Internet-ready analytic applications, which can be easily deployed and shared across and among enterprises. By using the XML for Analysis SDK, you can use XMLA today in SQL Server 2000 Analysis Services (or in other vendors' platforms), and XMLA will be a core part of the SQL Server 2005 Analysis Services platform. With its flexibility and broad support, XMLA is an excellent tool for current or future analytic application projects.
Related Reading |
---|
You can obtain the following articles from Windows 2000 Magazine's Web site at http://www.windowsitpro.com/articles.TERRY GIVENS"Two Web Services Solutions," .NET Developer Perspectives, November 28, 2000, InstantDoc ID 16202ROB HOWARD"Web Services, Part 1," .NET Developer Toolkit, May 15, 2001, InstantDoc ID 21082"Web Services, Part 2," .NET Developer Toolkit, June 12, 2001, InstantDoc ID 21412BRIAN MORAN"Use the SQL Server 2000 Web Services Toolkit to Get Started with .NET," SQL Server Perspectives, February 21, 2002, InstantDoc ID 24206RICH ROLLMAN"Web Services in Action," July 2002, InstantDoc ID 24910RUSS WHITNEY"XML for Analysis," April 2001, InstantDoc ID 19846 |
About the Author
You May Also Like