Get There with XPathNavigator
Exploiting the .NET XPath Query Engine to Navigate Hierarchical Data
October 30, 2009
InXPath Basics I gave a quick introduction to the syntax of XPath expressions tohelp the uninitiated get comfortable with XPath, which is a very importanttechnology to understand for working with XML data of many forms.
Thething about XPath is that it can't do anything on its own; it needs aprocessing engine to perform work based on the expressions. That processingengine could come in many forms. In .NET 1.x, XPath comes into play both forquerying and navigating XML data in documents, and also for transforming XMLdocuments using XSLT. In this article I'm going to give a quick introduction toworking with the XPath processing engine that you use to query and navigate XMLdata in .NET - specifically the XPathNavigator class and how to use it.
Getting to the Root of Things
Onereader correctly pointed out that in XPath Basics I did not cover an importantconcept of XPath related to absolute and relatives paths. That was partiallyintentional, so now let me set things straight on that account. In "XPathBasics" I emphasized that the evaluation of an XPath statement is alwaysrelative to the current context node. This is true whether you are talkingabout an individual location step within an XPath statement, or about an entireexpression. The context node can be set by a previous location step, or it canbe set based on the context of the processing engine that's evaluating theexpression.
So ifXPath expressions are always relative to the current context node, how can youhave an absolute path? The answer is that you can still think of an absolutepath in XPath as being relative to the current context node. The way to specifyan absolute path in XPath is to use the "/" character at the beginning of theexpression. This basically says "start at the root of the document". So fromthat perspective, an expression like /Music/Album is an absolute path that isevaluated starting at the root of the document, looking for a root elementnamed Music, containing a child element named Album. The way you can view thisas still being relative to the current context node is that you can legallyevaluate this expression against a reference to a node anywhere within thedocument, so the query is executed relative to the current node.
Thereason I waited to mention this, is that in order to make a statement like theprevious sentence, you're really starting to talk about the use of XPath with aparticular processing engine. Because I was going to wait until this article totalk about the XPath processing engine in .NET, I thought I'd wait to clarifythe relative vs. absolute path issue. That being done, let's get on with someprocessing!
XPathNavigator Knows the Way
Theprimary object for querying and navigating XML in .NET is XPathNavigator.If you've been using the Document Object Model (DOM) for dealing with XML for along time, you may feel more comfortable dealing with the XmlDocumentclass and using the SelectNodes method to perform queries. The truth isthat, under the covers, SelectNodes is using XPathNavigator foryou. And if you start using XPathNavigator directly, you can adopt aconsistent programming approach that will work with XmlDocument, XmlDataDocument,or XPathDocument objects. This will become even more important in .NET2.0 when XPathDocument gets a serious overhaul to its implementation,allowing it to track changes made to the document in a similar way thatDataSets do today.
The XPathNavigatorclass basically encapsulates a cursor into an XML node set, and allows you tonavigate or perform queries relative to that node. The class exposes a set ofmethods to move to sibling, parent, or child nodes, as well as a set of methodsfocused on executing a query using an XPath expression. Using an XPathNavigatorobject you can pre-compile an expression and use that compiled version toperform repeated queries with the same expression much more effectively.
You canget an XPathNavigator from any of the .NET XML document types by callingthe CreateNavigator method. What you get is an instance of an XPathNavigator with its underlyingcursor initialized to the root of the document. From there you can performqueries to obtain sets of other XPathNavigator objects that point to theresults of the query, or you can move the current cursor through the documentusing the navigation methods of the class. You can also access a number ofproperties on a navigator to extract the data contained in the node to which itis currently pointing so you can perform processing on that data.
Toperform a query with an XPathNavigator instance,you can call its Select method, passing in an XPath expression. What youget back is an XPathNodeIterator that allows you to step through theresults. This is another lightweight object that allows you to obtain an XPathNavigator reference to each of thenodes that matched the query. Using these references, you can then eitherextract data from the nodes, or you can use the navigator to perform subsequentqueries or navigation that will be done relative to the matching nodes.
Query for Music
Let'slook at an example. First we need some XML to work against. Say you have someXML that contains information about music. If you had a schema as shown inFigure 1, you would have a Music root element, Artist elements under that,Album elements under Artist, and Track elements under Album. Each of thoseelements has certain attributes, as shown in Figure 1, that you might beinterested in extracting for processing. The resulting XML looks like Figure 2.
Figure 1: The Music XML data schema.
Going Under Bring Me To Life Everybody's Fool
Figure2: A Music XMLfile.
Giventhat schema, let's say we first wanted to use an XPathNavigator to query for all the Album elements within adocument. The code for doing so would look like that shown in Figure 3.
public void ProcessAlbums(){ // Load a document. XPathDocument doc = new XPathDocument("Music.xml"); // Get a navigator initialized to the root. XPathNavigator nav = doc.CreateNavigator(); // Perform a query. XPathNodeIterator iter = nav.Select("//Album"); // Iterate through the results. while (iter.MoveNext()) { XPathNavigator navCurrent = iter.Current; ProcessAlbum(navCurrent); }}
Figure3: Querying theXML document for Album nodes.
In thecode in Figure 3, I first load the XML into an instance of XPathDocument. The XPathDocumentclass is the best to use in .NET if you don't need to modify the contents ofthe document while processing it. I obtain an XPathNavigator from the document by calling CreateNavigator.Using that navigator, I execute a simple XPath query for all descendantelements named Album (using the XPath shorthand operator // for thedescendant:: axis). That query returns an XPathNodeIterator that can beused to iterate through the results.
To usethe iterator, you call MoveNext, which returns true if there wereany more nodes to process in the iterator. If so, then the Currentproperty on the iterator will return a reference to an XPathNavigator positioned on the current node represented by theiterator. I take that navigator reference and pass it off to another method toprocess the results (which you can see in Figure 4).
public void ProcessAlbum(XPathNavigator navAlbum){ // Clone navigator to move off axis. XPathNavigator navArtist = navAlbum.Clone(); // Move to the parent (Artist) node. navArtist.MoveToParent(); // Move to its name attribute. navArtist.MoveToFirstAttribute(); // Output the artist name. Console.WriteLine(navArtist.Value); // Move to the album name attribute. navAlbum.MoveToFirstAttribute(); Console.WriteLine("t" + navAlbum.Value); // Move back up to the parent element. navAlbum.MoveToParent(); // Move down to first track element and output its text. navAlbum.MoveToFirstChild(); Console.WriteLine("tt" + navAlbum.Value); // Loop through the rest of the track elements. while (navAlbum.MoveToNext()) { Console.WriteLine("tt" + navAlbum.Value); }}
Figure4: Navigatingresults with the XPathNavigator.
In the ProcessAlbummethod, I switch from using a navigator as a query tool to using it to navigatea known schema of nodes. The code embeds the knowledge of the schema in theform of some explicit navigation steps from node to node using the navigatorthat was passed into the method representing an Album.
Thefirst thing the code in Figure 4 does is to clone the navigator. If you aregoing to move "off axis" to move up to a parent or down into a collection ofchild nodes, and you want to resume processing where you started, you'll needto clone the navigator before you start calling navigation methods. Rememberthat the navigator maintains a single reference (or cursor) into the nodessaying what the current context node is as far as it's concerned.
As soonas you call a MoveXXX method,that cursor has changed, and you'll have no easy way to get the context back towhere you started - short of reversing all the navigation steps you have taken.So if you clone a navigator, you can hold onto either the original or clonednavigator and use the other to move away from the current node. When you'redone with that processing path, you can simply resume using the clonednavigator that's still where it was when you cloned it, and throw away theother navigator.
Once thecode in Figure 4 has a cloned copy of the Album node navigator, it uses thecloned copy to move up to the parent node, which, based on the schema, shouldbe an Artist node with a name attribute. So it uses a couple of MoveXXX methods to move to thatattribute, and then simply spits out to the console the name of the Artist forthe album.
Afterthat, it resumes using the original Album navigator and moves down to its firstattribute, which should be the Album name. After spitting that out to theconsole, the code backs the navigator up to the parent, which is the originalAlbum element when you have moved to an attribute. That's one thing to get usedto when moving to attributes. They are not treated as child nodes of anelement, but the element itself is treated as a parent to the attribute node.Once the cursor is back on the Album element, the code moves it down to thefirst child element, which should be a Track element based on the schema.
Fromthere it extracts the Value property of the current node, which issimply the contained text node when the element contains text like the Trackelement. After processing the first child, it processes the remaining Tracks bycalling MoveNext on the navigator, which will keep moving the cursor tothe next sibling node until there are no more, at which point it will return falseand exit the loop.
The codeis very fast when you use the MoveXXXmethods to step through the nodes in the schema. So I could've used the Selectmethod repeatedly to get to each node of interest, issuing a different XPathexpression to ensure I got back the desired results. Performing a query,however, is much less efficient than simply bumping the node reference using a Movemethod.
Pre-compilefor Speed
Thereare many other things you can do with XPathNavigatorto process the contents of an XML document. The first to be aware of is that ifyou're going to perform the same query a number of times, perhaps on acollection of documents, then the query will execute significantly faster ifyou pre-compile the expression.
You dothis by calling the Compile method on the navigator, passing in an XPathexpression as a string and getting back an instance of an XPathExpressionobject. You can pass that XPathExpression object to the Selectmethod, and the execution of the Select method will be much quicker thanif you passed in the XPath as a string every time. Figure 5 shows a variationon the ProcessAlbums method that uses this approach.
public void ProcessAlbumsCompiled(){ // Load a document. XPathDocument doc = new XPathDocument("MusicBase.xml"); // Get a navigator initialized to the root. XPathNavigator nav = doc.CreateNavigator(); // Compile the query first. XPathExpression exp = nav.Compile("//Album"); // Perform a query using the compiled expression. XPathNodeIterator iter = nav.Select(exp); // Iterate through the results. while (iter.MoveNext()) { XPathNavigator navCurrent = iter.Current; ProcessAlbum(navCurrent); }}
Figure5: Executing acompiled expression.
The lastthing to mention about XPathNavigatoris that if you're evaluating an XPathExpression that will result in avalue instead of a set of nodes, you can use the Evaluate method insteadof Select. Evaluate will return a value corresponding to thevalue that results from the evaluation of the XPath expression. Remember fromlast time that I said that XPath expressions can result in a numeric, string,or Boolean value. The Evaluate method simply returns an objectreference, so you'll have to cast the result to the appropriate type. Fornumeric values, the return result comes into the .NET code as a double, soyou'll have to cast appropriately there (see Figure 6).
int GetAlbumCount(){ // Load a document. XPathDocument doc = new XPathDocument("MusicBase.xml"); // Get a navigator initialized to the root. XPathNavigator nav = doc.CreateNavigator(); // Compute the count of Album elements. double d = (double)nav.Evaluate("count(//Album)"); return (int)d;}
Figure 6:Returning a value from an XPath expression with Evaluate.
That's aquick tour of using the XPath processing engine with the XPathNavigator class to query and navigate a document. This shouldbe your preferred mode of dealing with XML (over using SelectNodes inthe XmlNode class) because it's portable across all the XML documenttypes in .NET and will be the way of the future when XPathDocument in.NET 2.0 introduces change tracking. I'll write more on that topic when we geta little closer to the .NET 2.0 release.
The files referenced in this article are available for download.
BrianNoyes is asoftware architect with IDesign, Inc. (http://www.idesign.net),a .NET-focused architecture and design consulting firm. Brian is a MicrosoftMVP in ASP.NET who specializes in designing and building data-drivendistributed Windows and Web applications. Brian writes for a variety ofpublications and is working on a book for Addison-Wesley on building WindowsForms Data Applications with .NET 2.0. Contact him at mailto:[email protected].
About the Author
You May Also Like