Querying Open Content

How to retrieve arbitrary XML fragments from SQL Server

Rich Rollman

April 23, 2003

10 Min Read
ITPro Today logo in a gray background | ITPro Today


Open content is the set of elements and attributes in an XML document that aren't specifically defined in the document schema. In simpler terms, you can think of open content as the elements and attributes in the XML document that your program doesn't specifically access. In April ("'Open' XML Content," InstantDoc ID 37840), I explained how you can build extensibility into your XML-enabled application by using OpenXML, XML Bulk Load, and updategrams to store the open content in SQL Server 2000. I also showed a concrete example of how to use OpenXML to store open content. The scenario required you to store an XML-formatted price list, including open content, in your SQL Server database. Let's continue with this scenario by examining how to extract the open content from your SQL Server database and send it to your fulfillment provider by including the open content with each line item of an order your customer placed. If you don't have the example database from last month, you can download the code to create it by entering InstantDoc ID 37840 at http://www.sqlmag.com.

You can choose from several different techniques to extract open content from your database. Using a simple approach, you can retrieve the data through a normal SQL query (as opposed to an SQL query that returns XML), then process the result into XML (which Figure 1 shows) by using the programming API of your choice. You might alternatively use a query that returns XML directly, in which case you have two options. You can use FOR XML EXPLICIT to write an SQL query that includes the open content in the query and returns XML. Or, you can use an XML View that includes the open content in a virtual XML document that you query by using XML Path Language (XPath). In both of these alternatives, SQL Server merges the open content stored in your database with the XML result you specify in other parts of the query. The method you choose depends on your programming environment. You can achieve the simple approach in many ways, ranging from simple text manipulation to processing the XML by using an object model. The other two techniques use SQL Server's XML support, so let's look at them in more detail.

FOR XML EXPLICIT Query


The FOR XML clause lets you obtain XML results directly from SQL Server 2000. FOR XML supports three modes of operation: Raw, Auto, and Explicit. Each option alters the format of the XML your query returns. Explicit mode is the only mode that supports open content. Explicit mode gives you complete control over the shape—sometimes called the XML grammar or schema—of the XML query result. Using an Explicit-mode query, you can specify the hierarchical structure of the XML document as well as the names for elements and attributes within the document.

You specify the names by using special structured column aliases in your SQL query. A structured column alias contains four logical fields, with an exclamation point separating each field. The first field specifies the name of the parent element, the second field specifies how the element is nested in the resulting XML document, and the third field, if not empty, specifies the name of the element or attribute that contains the column data. The fourth field lets you specify a formatting directive. A formatting directive causes column data to be formatted in a special way. For example, the cdata directive includes the column data within a CDATA section. (A CDATA section is a special way to avoid substituting < and > for < and > characters in the column data. For more details, see the XML Language specification at http://www.w3.org/TR/REC-xml#sec-cdata-sect.) Another formatting directive, xmltext, specifies that the contents of the column should be merged with the XML result from the query. Let's look at an example of how the xmltext directive works.

In the example scenario, once you've approved a customer's order for processing, you need to send the order to your fulfillment provider, who will then ship the items to your customer. In a real-world scenario, the order would contain a lot of data to be exchanged between you and your fulfillment provider. To keep things simple, let's just look at how you can include open content for the line items in the order. You stored open content for individual products when you processed the price list from your supplier. Now you need to include that open content with the order you send to your fulfillment provider.

For brevity, I won't show the code to join the example Orders table with the Products table; this is a straightforward relational query. Instead, let's focus on how to extract line items from the Products table. Listing 1 shows the FOR XML EXPLICIT query that extracts the product that has ProductID 22. Notice that the last column in the SELECT list uses a column alias that specifies the xmltext directive. This column retrieves the open content. Now look at the value in the Overflow column in Figure 1. The Product element has a weight attribute and a QuantityPerUnit subelement. Because of the xmltext directive in your query, SQL Server adds the weight attribute and the QuantityPerUnit element to the query result that Figure 2 shows. The xmltext directive makes including open content in a query result easy.

To execute the code in Listing 1, open Query Analyzer and select the April2003 example database. Then, copy the code into Query Analyzer and execute the code by pressing F5. You should now see the result that Figure 2 shows (press Ctrl-T before executing the query to retrieve results as text). Notice that SQL Server included the weight attribute and the QuantityPerUnit element in the LineItem element the query constructed. To see the effects of the xmltext directive on the output, remove the Overflow column from the SELECT list and rerun the query or simply remove the directive. Also try the xml directive, which includes the complete text of the Overflow column directly within the LineItem element.

XML View


Now that you've seen how FOR XML EXPLICIT queries include open content in their results, let's look at another way to query open content: using XML Views. An XML View exposes data in your SQL Server database as a virtual XML document. This virtual document's structure is defined by an XML Schema Definition (XSD) schema that includes annotations to map the elements and attributes you defined in the schema to the rows and columns of tables in your database. The schema and the annotations are collectively called a mapping schema. You can use an XPath query to query the XML View that the mapping schema defines. For details about XML Views, see my December 2002 through December 2003 XML Explorer columns or the excellent documentation included with the SQLXML 3.0 Web Release (http://msdn.microsoft.com/library/default.asp?url=/downloads/list/sqlserver.asp). Let's focus on including open content in the virtual XML document exposed through an XML View.

XML Views support a special annotation, overflow-field, that functions much like the xmltext directive we used in the FOR XML EXPLICIT query. To include open content within an element in your XML View, include the overflow-field annotation on the definition of the element in your mapping schema. You set the value of the overflow-field annotation to the name of the column in your database that contains the open content. When you execute an XPath query against the XML View that includes the element with the overflow-field annotation, SQL Server merges the open content with the element's other attributes and subelements. Let's look at a quick example using the same data as in the FOR XML EXPLICIT section.

The mapping schema in Listing 2 defines an XML View for the same document that Listing 1's FOR XML EXPLICIT query returns. The code uses the relation annotation to map the LineItem element to the Products table (you namespace-qualify all annotations by using the SQL prefix). Each row in the Products table produces a LineItem element in the XML View. The code also includes the overflow-field annotation on the definition of the LineItem element to signal that SQL Server should include the open content from the Overflow column when constructing the LineItem element. As you can see, including the open content is fairly simple.

Now, let's run the example code. Start by saving the mapping schema from Listing 2 to a file on your computer named products_map.xml. Also save the VBScript code from Listing 3 to a file called opencontent.vbs in the same directory where you saved the mapping schema. You might need to edit the script to adjust the connection settings for your environment. Now, open a command prompt in that directory and execute opencontent.vbs. In a pop-up window, the script code will display the XML it retrieved from your database. You might need to install Windows Script to execute opencontent.vbs. You can download Windows Script from http://msdn.microsoft.com/scripting.

Limitations


You need to be conscious of two limitations when including open content in XML query results: element order and naming conflicts. When SQL Server merges the attributes and elements in the open content with the elements and attributes created by the query, it doesn't preserve the relative ordering of the elements or attributes. All elements from the open content precede query-generated elements regardless of the columns' order in the query.

SQL Server appends attributes from the open content to the query-generated attribute list. But because attributes are, by definition, unordered, their relative order is insignificant. You can see the results of these limitations in Figure 2. Notice that the QuantityPerUnit element from the open content precedes the ProductName and Price elements, which the query generated directly, and the weight attribute appears after the id attribute. If the order of the elements in your XML query result holds some significance—as it might if the schema you're following dictates a specific order—you should choose an alternative storage format for storing the data that preserves order. For example, you could directly store the XML representation in a text column.

The second limitation involves the element and attribute names in the open content. You might encounter naming conflicts between the open content and the attributes and elements that the query generates directly. When a conflict occurs, SQL Server uses a simple set of rules to generate the query result. When an attribute name is in conflict, SQL Server discards the conflicting attribute from the open content. Only the attribute the query generated directly will appear in the result.

SQL Server treats elements differently. It doesn't detect conflicting element names; instead, it simply adds elements from the open content to the query-generated elements. And because of the ordering behavior I described earlier, duplicate elements from the open content will precede the query-generated elements. This can cause parsing-validation errors resulting from the duplicate elements or unexpected runtime errors from code that has dependencies on the element order. The best way to avoid such errors is to ensure that you don't store duplicate data in columns of your database and in open content and that the query doesn't explicitly include duplicate data. These limitations apply to both FOR XML EXPLICIT queries and XML Views because XML Views retrieve data from SQL Server by executing a FOR XML EXPLICIT query.

Extensibility Through Open Content


By using the techniques I described in this and my previous column, you can store open content in your database and include that open content in XML query results. With these techniques and technologies, you can build resilient, extensible applications today that adapt well to the presence of open content in the data the application receives—and sends—tomorrow.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like