Traveling the XPath

Finding and Filtering XML Data with XPath

Dan Wahlin

October 30, 2009

10 Min Read
ITPro Today logo

XtremeData

LANGUAGES: C#

ASP.NET VERSIONS:ALL

 

Travelingthe XPath

Findingand Filtering XML Data with XPath

 

By Dan Wahlin

 

XML(eXtensible Markup Language) has grown from a limited-use data storage formatto one that's increasingly used in a variety of applications on a variety ofdevelopment platforms. XML provides a great deal of flexibility and can be usedin many different ways, such as: data exchange, Web services, configuration,content management, and Web integration. Whether XML is used to tie distributedsystems together or to generate graphics based on Scalable Vector Graphics(SVG) technology, the data must often be queried, filtered, or sorted.

 

In Findand Filter Relational Data I presented different ways that relational datacan be searched, filtered, and sorted using ADO.NET classes in the .NETFramework. This article will focus on performing the same types of operationson XML data using XML-specific .NET classes. Performing these types ofoperations typically involves a language called XPath, so the next sectionprovides a quick introduction to XPath language fundamentals.

 

XPath Fundamentals

XPath isa language that can be used to search XML document hierarchies. Several .NETclasses provide support for XPath, including XmlDocument, XPathExpression,and XPathNavigator (to name a few). To use the XPath language with oneof these classes, you must create one or more XPath statements. Fortunately,XPath statements look somewhat similar to DOS path statements, and are fairlyeasy to learn with a little study and practice.

 

An XPathstatement is comprised of one or more location steps that identify how tolocate a node or set of nodes in an XML document. Each step is separated by aforward slash character, "/", and can be comprised of three main parts,referred to as the axis, node-test, and predicate:

 

axis::node-test[predicate]

 

The axisdetermines the direction of the search in the XML document. For example, willthe search look through all the children (the child axis) of a given node, lookfor an ancestor (the ancestor axis), look for a previous sibling (theprevious-sibling axis), or look along another axis such as the attribute ornamespace axes? The node-test identifies the name of the node to look for on agiven axis. If a node-test succeeds because of a node being found, the nextstep in the XPath statement (if another step exists) will be executed. Finally,the predicate is surrounded by brackets, "["and "]", and allows filterexpressions (similar to SQL language WHERE clauses) to be included to filterout undesirable nodes. The predicate is optional.

 

Thefollowing XPath statement shows how to search along the child and attributeaxes of the XML document shown in Figure 1:

 

/child::Customers/child::Customer[attribute::id='ALFKI']/

 

It also shows how to filter out unwanted Customer nodes using apredicate.

 

  

    AlfredsFutterkiste

    MariaAnders

  

  

    Du mondeentier

    JanineLabrune

  

Figure1: An XML documentcontaining customer data.

 

Thisstatement may be intimidating if you're new to XPath. Fortunately, because thechild axis is the default axis and the attribute axis can be abbreviated usingthe @ character, the previous XPath statement can be simplified to thefollowing:

 

/Customers/Customer[@id='ALFKI']

 

Thisstatement contains two location steps. It starts from the beginning of the XMLdocument and moves to the child axis, looking along the way for a node namedCustomers. It then moves to the Customers child axis and selects a node namedCustomer that has an id attribute with a value of ALFKI.

 

Thefollowing XPath statement would return all ContactName nodes found in the XMLdocument shown in Figure 1. Notice that no predicate is included in thestatement, which results in two nodes being returned:

 

/Customers/Customer/ContactName

 

To grabthe ContactName for the Customer node with an id attribute equal toDUMON, the following XPath statement can be used:

 

/Customers/Customer[@id='DUMON']/ContactName

 

Thisstatement moves to the child axis and finds the Customers node. It then movesto the child Customer node where the id attribute equals DUMON. If theCustomer node exists, it moves to the child node named ContactName.

 

Althoughthere is much more to the XPath language than can be covered in thisabbreviated introduction, you've now seen the different parts of an XPathstatement. The following sections will demonstrate how to execute XPathstatements using .NET classes.

 

Finding and Filtering XML Data with XPath

Thereare several classes that can be used to locate data in an XML document usingthe XPath language. Two you'll use most frequently are XmlDocument and XPathNavigator.The XmlDocument class (located in the System.Xml namespace) canbe used to read and edit data. XmlDocument works by loading an XMLdocument into a memory-based structure referred to as the Document Object Model(DOM). Another class, named XPathNavigator (located in the System.Xml.XPathnamespace) can also be used to execute XPath queries, although it cannot beused to edit data in version 1.1 of the .NET platform. Although XPathNavigatoralso works with an in-memory structure, the structure is optimized forexecuting XPath statements.

 

XmlDocument contains two XPath-aware methodsnamed SelectSingleNode and SelectNodes. Figure 2 shows an exampleof using these methods to query the document shown in Figure 1.

 

XmlDocument doc= new XmlDocument();

doc.Load(Server.MapPath("../Xml/Customers.xml"));

 

// Locatespecific ContactName node using XPath predicate.

XmlNode node =doc.SelectSingleNode("Customers/" +

                 "Customer[@id='ALFKI']/ContactName");

if (node !=null) {

  this.txtOutput.Text = "Found CustomerALFKI: " +

                          node.InnerText;

}

 

// Locate allContactName nodes.

XmlNodeListnodes = doc.SelectNodes(

  "Customers/" +"Customer/ContactName");

 

foreach(XmlNode node in nodes) {

  this.txtOutput.Text += "Found Customer" +

    node.ParentNode.Attributes["id"].Value +

    ": " + node.InnerText +"r";

}

Figure2: The SelectNodesand SelectSingleNode methods can be used to execute XPath queriesagainst a DOM structure using the XmlDocument class.

 

Thisexample first loads the XML data into the DOM by calling XmlDocument's Loadmethod. It then uses the SelectSingleNode method along with an XPathexpression to return a specific ContactName node. Once the ContactName isfound, its child text node can be accessed (or edited) using the InnerTextproperty.

 

Thesecond part of the code uses the SelectNodes method to select allContactName nodes within the XmlDocument. SelectNodes returns acollection of XmlNode objects (referred to as an XmlNodeList)that can easily be iterated through using a standard foreach loop.

 

In caseswhere XML data needs to be filtered with XPath, but not edited, the XPathNavigatorclass (located in the System.Xml.XPath namespace) is more efficient thanthe XmlDocument class. It provides an efficient memory store that isoptimized for XPath statements. XPathNavigator is an abstract class thatcannot be created directly using the new keyword. However, classes suchas XmlDocument, XmlDataDocument, XmlNode, and XPathDocumenthave a CreateNavigator method that can be used to create an XPathNavigatorinstance.

 

Figure 3shows an example of using XPathNavigator's Select method with XPathto select a single ContactName node from the XML document shown in Figure 1.

 

XPathDocument doc = new XPathDocument(

  Server.MapPath("../Xml/Customers.xml"));

XPathNavigatornav = doc.CreateNavigator();

nav.MoveToRoot();  // Move to document.

XPathNodeIteratorit = nav.Select(

  "Customers/Customer[@id='ALFKI']/ContactName");

 

if (it.Count> 0) {

  it.MoveNext();

  this.txtOutput.Text = "Found CustomerALFKI: " +

                          it.Current.Value;

}

Figure3: The XPathNavigatorclass is designed to work with XPath. This example shows how its Selectmethod can be used to locate a specific node in an XML document. Once the nodeis found, it can be moved to by calling the XPathNodeIterator's MoveNextmethod.

 

Figure 3starts by creating a new instance of an XPathDocument class. XPathDocumentprovides an efficient way to perform XSLT transformations, or create XPathNavigatorobjects. Once the XPathNavigator instance is created, its MoveToRootmethod is called to move to the XML document. Next, the Select method iscalled to locate a single ContactName node. Select returns an XPathNodeIteratorcollection that can be iterated through using its MoveNext method.Because the code in Figure 3 only tries to retrieve one node from the XMLdocument, the XPathNodeIterator's Count property is checked tosee how many nodes are in the collection. If the count is greater than 0, the MoveNextmethod is called to move to the first node in the collection and access itsvalue. Notice that XPathNodeIterator's Current property is calledto access the node and its associated child text node value.

 

Figure 4shows how multiple nodes can be selected using XPathNavigator's Selectmethod. Once the nodes are retrieved, they're iterated through using XPathNodeIterator'sMoveNext method.

 

XPathDocument doc = new XPathDocument(

  Server.MapPath("../Xml/Customers.xml"));

XPathNavigatornav = doc.CreateNavigator();

nav.MoveToRoot();  // Move to document.

XPathNodeIteratorit =

  nav.Select("Customers/Customer/ContactName");

 

while(it.MoveNext()) {

  it.Current.MoveToParent();  // Move up to parent node.

  this.txtOutput.Text += "Found Customer" +

    it.Current.GetAttribute("id",String.Empty);

  // Move back to ContactName node.

  it.Current.MoveToFirstChild();  

  this.txtOutput.Text += ": " +it.Current.Value + "r";

}

Figure4: This sampledemonstrates how to use XPathNavigator's Select method to locatemultiple nodes within an XML document. Once the nodes are located, they can beaccessed using XPathNodeIterator's MoveNext method.

 

Finding and Filtering XML News Nodes

Now thatyou've been introduced to the different ways XML data can be found and filteredusing XPath and different .NET classes, let's put this knowledge to work toperform a more useful (and fun) task. MoreOver.com provides XML news feeds on avariety of news topics, including world news, sports, technology, and even XML.The XML for ASP.NET Developers Web site (http://www.xmlforasp.net)serves XML and Web service articles found at the MoreOver.com Web site by tyinginto the following XML feed:

 

http://p.moreover.com/cgi-local/

  page?c=XML%20and%20metadata%20news&o=xml

 

Thereare many ways to extract the XML data from the MoreOver.com feed and displayit, including using the XmlTextReader, DataSet, and otherclasses. Figure 5 demonstrates how classes within the System.Net and System.Xml.XPathnamespaces can be used to access the remote XML data and filter nodes based onspecific keywords. These classes are encapsulated within an ASP.NET usercontrol named NewsItems.ascx to facilitate code re-use and allow for caching ofthe news items.

 

Figure 5shows a method named GetNewsXml in the user control. This method usesthe WebRequest object (located in the System.Net namespace) tograb the XML news document. It uses the XPathNavigator class to filterthe data and output news headlines to the browser.

 

private stringGetNewsXml(string url, string filter) {

  StringBuilder newsHTML = newStringBuilder();

  WebRequest req = null;

   WebResponse resp = null;

  XmlTextReader reader = null;

  string xpath ="//article[contains(headline_text,'" +

                   filter + "')]";

  try {      

    req = WebRequest.Create(url);

 

    // If you're behind a proxy server,uncomment the

    // following code, and update the domain,user, and

    // password.

    //----------------------------------------

    // WebProxy proxyServer =

    //    new WebProxy("proxyServer.com",true);

    // NetworkCredential cred =

    //    new NetworkCredential("user","pwd","domain");

    // proxyServer.Credentials = cred;

    // req.Proxy = proxyServer;

 

    resp = req.GetResponse();

    reader = newXmlTextReader(resp.GetResponseStream());

    XPathDocument doc = newXPathDocument(reader);

    XPathNavigator nav =doc.CreateNavigator();

    // Select all article nodes that meetfilter condition.

    XPathNodeIterator it = nav.Select(xpath);

    int count = it.Count;

    int i = 0;

    while (it.MoveNext()) {

      // Access article url and headline_textchild nodes.

      XPathNodeIterator itURL =

        it.Current.SelectChildren("url",String.Empty);

      itURL.MoveNext();  // Move to selected node.

      XPathNodeIterator itHeadline =

        it.Current.SelectChildren(

        "headline_text",String.Empty);

      itHeadline.MoveNext();  // Move to selected node.

 

      newsHTML.Append(""");

      newsHTML.Append(itURL.Current.Value);

      newsHTML.Append("","");

      newsHTML.Append(itHeadline.Current.Value);

      newsHTML.Append(""");

      if (i != count-1)  

        newsHTML.Append(",");

      i++;   

    }

  }

  catch {

    newsHTML = String.Empty;

  }

  return newsHTML;

}

Figure5: TheNewsItems.ascx user control accesses XML data from a remote URL and filters outunwanted nodes using XPath. The resulting nodes are converted into a stringarray that is sent down to the browser and manipulated using JavaScript.

 

DisplayingXML news headlines that can be filtered and cached within an ASP.NET Web Formis as simple as adding the following user control syntax:

 

  URL="http://p.moreover.com/cgi-local/        page?c=XML%20and%20metadata%20news&o=xml"   NewsFilter="XML"CacheName="XMLNewsCache"   CacheDuration="60"HeadlineDelay="8000" runat="server" />   Figure 6shows the output generated when the ASP.NET Web Form is run (Note: The newsheadlines are displayed dynamically using DHTML and JavaScript).  
Figure 6: The XML news user control sendsthe appropriate headlines to the browser based on the filter text specified inthe control syntax. The user control relies on XPath and the XPathNavigatorclass to do the majority of the work.   Conclusion XMLcontinues to become more and more prevalent because of its ability to mark updata in a flexible and platform-neutral manner. In this article you've beenintroduced to the XPath language, as well as several ways that XPath can beused to access nodes within an XML document using native .NET classes, such as XmlDocumentand XPathNavigator. Learning the different techniques to find and filterXML data will allow you to create more flexible ASP.NET Web applications thatcan leverage data retrieved from a variety of sources.   The sample code in thisarticle is available for download.   DanWahlin (MicrosoftMVP for ASP.NET and XML Web services) is the president of Wahlin Consulting andfounded the XML for ASP.NET Developers Web site (http://www.XMLforASP.NET), which focuseson using ADO.NET, XML, and Web services in Microsoft's .NET platform. He's alsoa corporate trainer and speaker, and teaches XML and .NET training coursesaround the US. Dan coauthored ASP.NET Insider Solutions (SAMS 2004), Professional Windows DNA (Wrox, 2000), ASP.NET: Tips, Tutorials and Code (SAMS, 2001), and authored XML for ASP.NET Developers (SAMS, 2001).      

Read more about:

Microsoft
Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like