Compare and Patch XML Documents
The XML Diff and Patch tool synchronizes data on different servers, allowing for easy transfer.
October 30, 2009
RELATED: "Displaying XML in .NET" and "Displaying XML in ASP.NET."
Comparing XML documents and identifying differences can bea difficult task, particularly when writing this type of functionality fromscratch. Fortunately, Microsoft has released version 1.0 of its XML Diff andPatch tool.This handy .NET assembly allows XML developers to compare XML documents,generate a differences document (referred to as an XDL DiffGram), and patchoriginal documents to synchronize them with others (see Figure 1).Interestingly, this functionality potentially could allow cached XML data orconfiguration files on different servers to be synched. Instead of sending theentire modified document across the network, a DiffGram document - likely muchsmaller - could instead be sent so the appropriate data or files would be"patched."
In this article, I'll provide an overview of thefunctionality found in the XML Diff and Patch tool and demonstrate how you canuse it to generate DiffGram documents.
Get to Know the XML Diff Language
Before examining how to create .NET applications that cancompare XML documents, let's take a quick look at the XML Diff Language(DiffGram) used by the XML Diff and Patch tool. This language defines severaldifferent elements used to determine when nodes should be added, removed, orchanged. It also defines path descriptors that identify where modificationsshould occur in the original XML document. Figure 2 shows an exemplary XDLDiffGram document.
3 Jane Doe Figure 2. The XML Diff and Patch tool relies on XDLDiffGrams to define patches that should be applied to XML documents. Thisexample defines a change in an attribute named customerID and adds a newcustomer child node into the "patched" XML document. Although I won't go into much detail about pathdescriptors because the tool generates them automatically for you, the helpdocumentation contains this definition: "The XML Diff Language (XDL) uses pathdescriptors to identify the nodes in the source XML document. Path descriptorswork on a DOM data model and use the node position as the core identifier forthe nodes. XDL does not use XPath because the XPath data model differs fromDOM." All path descriptors refer to the original source XML tree before changesare applied. When the path descriptor is applied to the first node of thesource tree, which has been changed to be the third node in the changed tree,the path descriptor for this node is "1" because the node is first in thesource document (here the source document functions as a base). Aside from path descriptors, XDL DiffGrams also canmaintain add, remove, and change elements. The add element is used when nodesthat do not exist in the source XML document appear in the modified XMLdocument. The following fragment, for example, specifies that a new customernode was found after the first node, which did not exist in the source XMLdocument. Note that the path descriptor is shown in the match attribute: Jane Doe The remove element identifies any nodes removed from thesource XML document. The next example demonstrates how you can use it to removethe second node from an XML document: Finally, the change element identifies any data valuechanges between the source and modified XML documents: 1 The change element demonstrated in the preceding examplechanges the first node's customerID attribute to a value of 1. Although you do not necessarily need to know the XDLDiffGram format to use the XML Diff and Patch tool, a basic understanding ofwhat the tool generates can help you better understand what's going on behindthe scenes. This can be helpful, particularly in situations where problemscould arise. (The tool's help documentation contains many more details aboutthe DiffGram format if you need more information.) Now that you've seen what an XDL DiffGram looks like,let's examine how to create one using the classes that ship with the tool. Find Differences BetweenDocuments To compare XML documents and find differences, you mustfirst reference the Microsoft.XmlDiffPatch namespace within your application toaccess the classes it contains. The XmlDiff class contains the necessaryproperties and methods to perform a comparison between XML documents.Additionally, the class's properties allow you to control which parts of XMLdocuments are to be compared. Different types of nodes - such as namespaces,white space, processing instructions, and more - may be ignored, if so desired. After determining which nodes will and will not becompared, you can set the Algorithm property of the XmlDiff class to a value ofAuto, Fast, or Precise. Figure 3 replicates the table from the XML Diff andPatch tool documentation that describes these enumeration values. Member Name Description Auto Default. Chooses the comparison algorithm for you depending on the size and assumed number of changes in the compared documents. Fast Compares the two XML documents by traversing the XML tree and comparing it node-by-node. This algorithm is fast but might produce less precise results. For example, it might detect an add-and-remove operation on a node instead of a move operation. Precise Based on an algorithm for finding editing distance between trees, also known as Zhang-Shasha algorithm. This algorithm gives very precise results, but it might be slow on large XML documents with many changes. Figure 3. The XmlDiffAlgorithm enumeration containsthe three differing members shown in this table. For larger XML documents,performance will be improved by using the Fast value; for smaller documents,you can find more precise differences by using the Precise value. After the Algorithm property is set to a valid enumerationmember, the Compare method can be called; it contains several differentoverloaded versions:public Boolean Compare(String, String, Boolean)public Boolean Compare(String, String, Boolean, XmlWriter)public Boolean Compare(XmlNode, XmlNode)public Boolean Compare(XmlNode, XmlNode, XmlWriter)public Boolean Compare(XmlReader, XmlReader)public Boolean Compare(XmlReader, XmlReader, XmlWriter) Note that the first two overloads accept the paths to the XMLdocuments being compared. You also can pass XML documents that have been loadedinto an XmlNode object (an XmlDocument, for example) and even compare fragmentsagainst each other. The example shown later in Figure 6 utilizes the lastoverload in the preceding list. This version accepts two XmlTextReader objectsthat contain the original and modified XML documents, as well as the XmlWriterthat the difference document (DiffGram) eventually will be written to. Toillustrate how the DiffGram document is generated from the differences betweentwo web.config files, I have provided original and modified web.configdocuments, shown in Figures 4 and 5, respectively. Figure 4. The original web.config document containsthe standard elements and attributes found in a typical web.config file. Figure 5. The modified web.config document contains anappSettings node as well as an associated child node (marked in bold) not foundin the original web.config document shown in Figure 4. Figure 6 shows a method named GenerateDifferences thatacts as a wrapper around the XmlDiff class and its Compare method.GenerateDifferences accepts two input parameters representing the original andmodified XML documents. It returns a Boolean value that informs the callingprogram whether differences exist or not. If a DiffGram document is generated,it is returned by the third parameter (named diffDoc), which is passed byreference from the calling program using the C# ref keyword.public bool GenerateDifferences(string doc1, string doc2, ref string diffDoc) { //Create XmlDiff object for document comparison XmlDiff diff = new XmlDiff(); //Set comparisons that should be ignored diff.IgnoreComments = true; diff.IgnorePI = true; diff.IgnoreWhitespace = true; //Choose most precise algorithm //For large documents look at using //the "Fast" algorithm diff.Algorithm = XmlDiffAlgorithm.Precise; //Compare documents and generate XDL diff document StringWriter sw = new StringWriter(); XmlTextWriter writer = new XmlTextWriter(sw); writer.Formatting = Formatting.Indented; XmlTextReader originalReader = new XmlTextReader(new StringReader(doc1)); XmlTextReader modifiedReader = new XmlTextReader(new StringReader(doc2)); bool status = diff.Compare(originalReader, modifiedReader,writer); //Output difference document (ref parameter) diffDoc = sw.ToString(); //Close writer writer.Close(); originalReader.Close(); modifiedReader.Close(); //return status return status;}Figure 6. The XmlDiff class allows XML documents to becompared by calling its Compare method. The GenerateDifferences method shown inthis example permits two strings containing XML data to be compared to oneanother. The code begins by creating an XmlDiff object and settinga portion of its properties. Within this example white space, comments andprocessing instructions are ignored as XML documents are compared. Note thatthe Algorithm property is assigned to a value of Precise. (For details on thePrecise enumeration value, refer back to Figure 3.) After creating the XmlDiff object, the code creates theXmlTextWriter object that will write out the DiffGram document. The Formattingproperty is set to Formatting.Indented so the DiffGram is indented nicely andthus easily legible. Two XmlTextReader objects are then created and loaded withthe XML string data passed into the GenerateDifferences method. By using theStringReader class, XML data in the form of a String can be loaded directlyinto an XmlTextReader. After the XmlTextReader objects are created, the XmlDiffobject's Compare method is called and the two readers and writer are passed inas arguments. Compare returns a Boolean true when the documents under comparisonmatch up, and a false when they have differences. In cases where the documentsdiffer, the XmlTextWriter writes the DiffGram document to a StringWriter,which, in turn, is assigned to the "ref" parameter named diffDoc. Although thisexample accepts strings as input, you easily can modify it to handle file pathsif desired. Patch XML Documents Once an XDL DiffGram document is generated by calling theXmlDiff class's Compare method, you can use the DiffGram to "patch" originaldocuments and sync them with modified documents. You do this by using anotherclass in the Microsoft.XmlDiffPatch namespace named XmlPatch. XmlPatch containsa method, named Patch, which handles the modification of the source XMLdocument. The method has several different overloaded versions as shown below:public void Patch(XmlDocument, XmlReader)public void Patch(XmlNode, XmlReader)public void Patch(String, Stream, XmlReader)public void Patch(XmlReader, Stream, XmlReader) In cases where the original document must be modified directly,you can use one of the first two versions of the Patch method. The DiffGramalways is loaded into an XmlReader object before being passed as an argument.You also can pass the document generated from running the patch operation to astream in cases where the document needs to be moved or saved in a differentlocation. This is available in the final two overloaded versions illustratedpreviously. Figure 7 demonstrates how to utilize the first overloaded versionto patch a document. public string PatchXML(string original,string patchXml) { StringWriter sw = new StringWriter(); XmlTextWriter writer = new XmlTextWriter(sw); writer.Formatting = Formatting.Indented; XmlDocument originalDoc = new XmlDocument(); originalDoc.LoadXml(original); //Create XmlPatch object to perform patch operation XmlPatch patch = new XmlPatch(); XmlTextReader reader = new XmlTextReader(new StringReader(patchXml)); //Perform patch operation patch.Patch(originalDoc, reader); originalDoc.Save(writer); reader.Close(); return sw.ToString();}Figure 7. The XmlPatch class can read an XDL DiffGramdocument and use it to patch an existing XML document. By shipping DiffGramdocuments around, documents can be kept in sync without being forced to ship amass of data over the network, particularly in cases involving large XMLdocuments. The code in Figure 7 demonstrates how you could utilizethe PatchXML method to wrap the functionality exposed by the XmlPatch class viaits acceptance of both the original XML document and DiffGram document as astring. First, the original document is loaded into an XmlDocument while theDiffGram is loaded into an XmlTextReader. Both objects are then fed into thePatch method that applies any modifications to the original document. Figure 8shows a portion of the output generated by running this article's sampleapplication. Figure 8. The XML Diff and Patch sample application allows original andmodified XML documents to be compared. This figure shows the output generatedwhen differences between the documents are found as well as how you can usethese differences to "patch" an XML document. The process of comparing and modifying XML documents hasbeen greatly simplified with Microsoft's XML Diff and Patch tool. In this articleyou learned how to write wrapper methods around the comparison and patchingfunctionality built into XML Diff and Patch and how it uses DiffGrams to trackchanges. By leveraging the XML Diff and Patch tool you easily can syncdifferent XML documents containing cached data, configuration data, or dataused for another purpose. The code used in this article is available for download. To see a live example of the code shown in this article,visit the XML for ASP.NET Developers Web site: http://www.xmlforasp.net/codeSection.aspx?csID=84.
About the Author
You May Also Like