The DiffGram Format

Dino Esposito explains how the DiffGram format is different from other XML formats.

Dino Esposito

January 7, 2002

4 Min Read
ITPro Today logo in a gray background | ITPro Today

The DiffGram format forms the foundation of XML representation of data sets in .NET

The DiffGram is one of the two XML formats that you can use to render DataSet object contents to XML. The other format is the standard XML layout, which includes a root node that is named after the DataSet and that includes as child subtrees the names of the embedded tables. Each subtree represents one record in the given table. All this information is available through the DiffGram format, and much more, as well.

The key difference between the two XML formats lies in the ultimate goal for which they were designed. The ordinary XML serialization is mostly for persistence and local data storage, whereas the DiffGram format serves a slightly more ambitious purpose. The DiffGram is a format intended for network data exchange and .NET remoting.

You can persist a DataSet object with or without schema information, and the resultant XML file is a simple snapshot of the data being stored in the DataSet. In fact, only the current value of each row and column is persisted. Each row's state (added, modified, deleted) is lost, along with any error information associated with it.

The DiffGram format, however, includes also row state and row error information and considers both current and original values of each updated row. When you transmit a DiffGram over a network, or more simply across the boundaries of two distinct .NET processes, you serialize the representation of the DataSet object's living instance. From a DiffGram, you can rebuild a high-fidelity copy of the original DataSet with both error and row state information. By contrast, from the ordinary serialization XML format, you can build only a brand-new DataSet with fresh, unchanged, and error-free row information. Incidentally, the .NET Framework always serializes the DataSet object as a DiffGram when it has to travel across process or network boundaries. For example, a .NET Web Service's clients always get return DataSets as DiffGram XML strings.

A DiffGram is simply an XML string written according to a particular schema. It is in no way a .NET type. You create a DiffGram, or populate a DataSet from a DiffGram, by using the same methods you would use for the ordinary XML serialization of ADO.NET objects. These methods are WriteXml and ReadXml. The following code snippet shows how to serialize a DataSet object to a DiffGram:

StreamWriter sw = new StreamWriter(fileName);ds.WriteXml(sw, XmlWriteMode.DiffGram);sw.Close();

The resultant XML code is rooted in the node and contains up to three distinct data sections, as follows:

            :                    :                    :      

The first section (MyDataSet, in the example above) is mandatory and represents the current instance of the data. The MyDataSet block is nearly identical to the ordinary XML serialization you can get. There are only a few differences, the greatest of which is that the DiffGram format never includes schema information. Another difference is that the DiffGram format doesn't support the Hidden column mapping type. When writing a table to XML, you can decide on a per-column basis how the value of that column has to be rendered. Values are normally rendered through a node element, but you can use attributes or, under special circumstances, simple text to change the representation. Another possible option is to hide the column completely from the XML representation. The XML serializer ignores such columns when you ask for an ordinary XML serialization, but columns are included when a DiffGram is prepared. However, these columns include a special annotation (that is, a node attribute) that states that they were originally marked as hidden.

The section includes the original values of the rows that have been modified or deleted since the DataSet's creation. The DataSet's current state is stored in the previously mentioned mandatory block. The difference between the rows' original and the current values--the DataSet's changes--is stored in the section. Only modified or deleted records are listed here. Newly added records are listed in only the data instance because they have no preceding reference to link to. The DiffGram format uses a unique ID to track the records in the two sections. Finally, the messages related to pending errors on rows are listed in the sections.

You use the ReadXml method to load a DiffGram into a DataSet:

StreamReader sr = new StreamReader(fileName);ds.ReadXml(sr, XmlReadMode.DiffGram);sr.Close();

Note that for the loading to succeed, the DataSet must already have a compliant schema. All the necessary tables and columns have to exist already, even if they're empty. This requirement is a direct consequence of the lack of schema information in the DiffGram format.

Other platforms can also use the DiffGram format to send and receive information to a .NET Framework application. The DiffGram format is at the foundation of XML representation of data sets in .NET.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like