Yukon’s XML Data Type

XML becomes a first-class citizen

Rich Rollman

November 19, 2003

6 Min Read
ITPro Today logo in a gray background | ITPro Today


Microsoft introduced XML to its relational database system in SQL Server 2000. With SQL Server 2000, you can obtain XML query results from your database and extract data from an XML document for storage in your database. However, although OpenXML's mp:xmltext metaproperty lets you store XML fragments or entire XML documents in a text column, in SQL Server 2000, you have to write a lot of T-SQL or client-side code to query and update stored XML. But that's about to change.

Microsoft recently took the next step toward making XML an integral part of SQL Server in its release of Yukon Beta 1, a private beta that provides a first look at the new capabilities in the next version of SQL Server. One of the most exciting features in this release is the introduction of a native XML data type whose capabilities go well beyond those of SQL Server 2000's XML technologies. And unlike those technologies, the XML data type is integrated seamlessly into SQL Server's storage and programming models. So, you can query and update XML documents and even write joins between XML data and relational data in your database.

With the XML data type, you can create variables and columns that natively store XML data. Plus, you can specify an XML Schema Definition (XSD) schema that restricts the XML stored in the column or variable to the vocabulary described in the schema; Microsoft calls this typed XML. And you can query XML by using a new XML-centric query language called XQuery, which is a Microsoft implementation of the pending World Wide Web Consortium (W3C) XML Query standard. Microsoft has even extended XQuery with the ability to insert, delete, and update XML documents—functions that are scheduled for a later release of the W3C standard—so you can modify XML documents stored in your database. Let's see how each capability works.

Create XML Variables, Columns, and Parameters


The XML data type—xml—is a built-in data type just like varchar, int, and others. You use it the way you'd use any other SQL Server data type. Columns, variables, parameters, and values that functions return can all use the xml data type. For example, you can declare an XML variable named myDoc:

DECLARE @myDoc xml

You can declare a stored procedure that takes an XML document as a parameter:

CREATE PROCEDURE returnData    @xmldoc xml AS...GO

And you can create a table that has an XML column by using the following code:

CREATE TABLE myDocs (  docID int NOT NULL,  doc xml NOT NULL)GO

Although the XML data type is a built-in data type, it also functions like a user-defined data type (UDT) by providing several methods that let you query and update data stored in an XML variable or column. Table 1, page 41, describes the supported methods. You can use these methods to query, obtain scalar values from, and modify an XML document that's stored in a variable, column, or parameter. The query, value, and exist methods take XQuery expressions as parameters. (I describe XQuery in a moment.) The modify method takes XML DML—an XQuery extension for updating an XML document—as a parameter.

To illustrate how the query, value, and exist methods work (I'll cover the modify method in a later column), consider the XML document that Figure 1 shows. I extracted this document from the Demographics column of a row in the Individuals table of Yukon's new AdventureWorks sample database and reduced its size to fit the article. The T-SQL commands in Listing 1 show how you can assign XML to a variable and use the xml data type methods to query the XML variable. After initializing the @myDoc variable with the IndividualSurvey document, the query method retrieves the NoOfCarsOwned element for male individuals, the value method retrieves an integer representing the number of cars the individual owns, and the exist function returns a boolean value telling whether the NoOfCarsOwned element is present. Figure 2 shows the result of running Listing 1's T-SQL code.

Typed XML Variables and Columns


Programmers frequently use type safety in languages such as C# to reduce bugs by ensuring that you can't inadvertently mix different data types when assigning variables. You can get similar results with Yukon's xml data type. When you specify an XSD schema for a variable, column, or parameter, Yukon guarantees that the assigned or stored XML document adheres to the structure and content that the schema describes. Using typed XML columns has the added benefit of reducing the storage size of an XML document because you store native data types instead of the raw string values in the XML document.

To create a typed XML column, you first need to register the schema. Registering a schema in Yukon associates the schema with a namespace—a unique universal resource identifier (URI). To register a schema, you use the new CREATE XMLSCHEMA statement. For example, you could use

CREATE XMLSCHEMA schema_string

where schema_string is Figure 3's schema.

After registering the schema, you associate variables, columns, or parameters with the schema by specifying the URI for the namespace on the variable, column, or parameter declaration. For example, you can use the code in Listing 2 to specify types for the declarations from the previous section of this article.

Query XML


XML documents store data hierarchically, nesting related elements inside one another. For example, an order element likely contains nested line-item elements for each product in the order. But you can't easily use an SQL query against the hierarchical structure that you create by nesting elements because SQL is designed for navigating relations, not hierarchies. So to complement the XML data type, Yukon introduces XQuery, a hierarchical query language for XML. XQuery is similar to XPath, the hierarchical language you use in SQL Server 2000's OpenXML statement. XPath lets you navigate the hierarchy of XML documents much like you navigate the file system on your personal computer. Although powerful, XPath is limited in its ability to create new XML vocabularies as the result of an expression; it can only return existing parts of an XML document. XQuery extends XPath to let you create a new XML document that contains selected parts of another document and new parts you specified in the query.

Yukon uses XQuery to select parts of an XML document that's been stored in a T-SQL variable, XML column, or XML parameter. You saw an example of an XQuery in Listing 1. In the SELECT statements, the parameter to the query, value, and exist methods is a simple XQuery statement that selects the number of cars owned by male individuals. This XQuery doesn't create a new XML document. But the following XQuery, which uses the @myDoc variable from Listing 1, constructs MaleHobby elements from the Hobby elements in the original document:

SELECT @myDoc::query('  FOR $hobby IN    /IndividualSurvey    [Gender="M"]/HobbyRETURN  {data($hobby)}')

The FOR statement iterates through each Hobby element you selected with /IndividualSurvey[Gender="M"]/Hobby and uses the data from the Hobby element to build new MaleHobby elements. Stated simply, it changes Hobby elements to MaleHobby elements.

These XQuery examples only scratch the surface of XQuery's capabilities. I'll cover XQuery in-depth in a future article.

XML Made Easier


Yukon's new xml data type provides a convenient way to store, query, and retrieve XML documents. Besides reducing the amount of code required to perform these tasks, the xml data type enables a rich interaction between XML and relational data. In future columns, I'll introduce you to more of Yukon's XML features and share tips about how to build more functional XML applications with less code.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like