Merge Disparate Source Data

Learn to pull data from multiple sources into a single DataSet.

Brian Noyes

October 30, 2009

9 Min Read
ITPro Today logo in a gray background | ITPro Today

DataStream

LANGUAGES: C#

ASP.NETVERSIONS: 1.0 | 1.1

 

MergeDisparate Source Data

Learnto pull data from multiple sources into a single DataSet.

 

By BrianNoyes

 

TheDataSet is a powerful object. You can add multiple tables, set relationsbetween the tables, and add constraints, giving you a powerful in-memorycontainer for data. It can keep track of modifications to rows, and can use theoriginal and the modified rows to perform updates using optimistic concurrency.

 

How youget data from multiple sources into one DataSet depends a lot on where the datais coming from. If you are executing the queries yourself against one or moredatabases to construct the DataSet, then there are lots of examples of how toexecute a query against a database using the ADO.NET Command object and aDataAdapter to fill the DataSet with one or multiple tables.

 

But whatdo you do if you are aggregating data from multiple sources, and that datacomes in the form of pre-constructed DataSets or XML that you want to turn intoa DataSet? Perhaps you are getting a DataSet as the return value from a Webservice that you do not control, or are getting separate XML files representingtables of data from other data sources. You want to get that data into a singleDataSet for data binding, updates, or manipulation purposes.

 

Adding Dataand Schema with Merge

Theanswer to bringing together data from multiple sources is actually verystraightforward, but seems to elude many people. The DataSet class implements amethod named Merge that allows you to pass in a source DataSet, DataTable, orDataRow array to a target DataSet and have the data from the source merged intothe target DataSet. There are a number of overloads of the Merge method,allowing you to approach things in different ways depending on your needs. Youcan simply add multiple tables to a DataSet by merging them in from multiplesources, or you can add data from multiple sources into a single table. Both ofthese approaches include many options in terms of the way merge is performedwith respect to the schema of the target DataSet and whether changes arepreserved when data is imported into an existing table in the target DataSet.

 

Todemonstrate some of the options and the basic approach, the sample codeaccompanying this article includes a simple application that pulls in data fromthree XML files (see end of article for download details). Two of the XML filescontain data from the Northwind Orders and Order Details tables. The thirdcontains some modified and added records for the Orders table that the codemerges into the existing Orders table data to effect changes to the existingdata. Imagine that these three XML files could have come from completelydifferent data sources over the wire. However, the data has implicit relationshipsthat you want to recapture and use in your application.

 

Yourfirst thought might be that you can just read multiple XML files into theDataSet in a similar way to loading multiple queries into a DataSet using aDataAdapter. The ReadXml method of the DataSet makes it easy to read in XMLinto a DataSet, but it only supports repopulating the DataSet from scratch, notadding information to the DataSet incrementally. Unfortunately, with thecurrent version of .NET, there is no XmlAdapter. I will be covering some of thenew capabilities just announced for .NET 2.0 in future columns, but there is infact an XmlAdapter in .NET 2.0. If you need to do this kind of loading ofmultiple XML files into a DataSet a lot today, consider writing your own classto encapsulate the process until the .NET 2.0 functionality is released.

 

So thesolution in this case is to use the Merge method on the DataSet class. Let'sstart with the case where you have data coming from two sources that representdifferent tables, but that there is an implicit parent-child relationshipbetween the tables. To get the data into a single DataSet with thatrelationship, the first step is to read the data into two different DataSetsfirst (assuming XML as the source data format - you may also have been passed afully constructed DataSet from a Web service method or class library call):

 

DataSetdsOrders = new DataSet();

DataSetdsOrderDetails = new DataSet();

dsOrders.ReadXml("Orders.xml");

dsOrders.Tables[0].TableName= "Orders";

dsOrderDetails.ReadXml("OrderDetails.xml");

dsOrderDetails.Tables[0].TableName= "OrderDetails";

 

Notethat the code is renaming the tables after reading them in. This is because theXML files for the sample were purposely saved without schema information tomore closely represent data from disparate data sources, where there may not beany agreed upon common schema. To bring the two tables together into oneDataSet, you then call the Merge method on the target DataSet into which youwant to aggregate the data from the source DataSet. Then you can create therelationship between the tables:

 

// Perform themerge

dsOrders.Merge(dsOrderDetails.Tables["OrderDetails"]);

// Now that thechild table is there, create the

// relationshipand constraints

dsOrders.Relations.Add("FK_Orders_OrderDetails",

  dsOrders.Tables["Orders"].Columns["OrderID"],

  dsOrders.Tables["OrderDetails"].Columns["OrderID"],true);

 

In thiscase, I am using the version of the Merge method that simply takes a DataTablereference. It will also transfer the schema information from that table intothe DataSet and add it as an additional table (see Figure 1). The result isthat a new DataTable is added to the target DataSet (dsOrders) with the schemaof the Order Details table and its contained data.

 


Figure 1. Calling Merge on one DataSet andpassing in the table from another results in a copy of the table from thesource being created in the target.

 

Change Datawith Merge

Nowlet's take a look at another scenario where you might use the Merge method. Sayyou have a DataSet that contains your current data for a table. That currentdata may or may not have modifications already made to it since it wasretrieved from its data source. You receive a set of data from some othersource that contains new or modified data that you need to use to performupdates to the existing set of data. The data may be in the form of an XMLfile, a DataSet with a DataTable that matches the schema of the targetDataTable, or perhaps an array of DataRows resulting from a method call such asSelect or GetChanges on an existing DataSet.

 

In thissituation, when you call Merge on the target DataSet, passing in source datawhose schema matches an existing table in the target, the data from the sourcewill be merged into the target. What "merge" means in this case depends on anumber of factors.

 

Thesimplest to understand is the default case, where you just pass in the sourcedata to a target, and both source and target have matching table schema withprimary key information. In this case, if a row in the source data matches theprimary key of a row in the target data, the source data will replace thecurrent values of the fields of the corresponding row in the target data. If nomatching primary key is found for a row in the source data, it will be added tothe target data as a new row (see Figure 2). If there are any schema mismatchesor if a constraint is violated, an exception will be raised at the end of theupdating process with embedded information about what went wrong, and errorinformation will be added to the offending rows.

 


Figure 2. Rows from the source table replacethe matching rows in the target table. Rows from the source that do not matchtarget rows are simply added to the target as new rows.

 

If youuse one of the overloads of the Merge method that take a BooleanpreserveChanges parameter, and you pass true for that parameter instead of thedefault value of false, the behavior in the scenario is quite different. Whenyou tell the Merge method to preserve changes, if a row in the source data hasthe same primary key as a row in the target data, the target data row currentvalues will remain unchanged, but the original values for the row in the targetwill take on the original values of the source row.

 

I'm notsure I see a lot of value in using this approach, because if the originalvalues change, you will get concurrency exceptions if you try to use the targetDataSet to perform updates to its data source. But that is the designedbehavior of the Merge method when told to preserve changes. Figure 3 shows thecode for the method that reads in the source and target data and merges them.Note that it is important to have primary key information available in both ofthe tables. Without this, Merge will treat all rows coming from the source asnew rows.

 

private voidMergeRows(object sender, System.EventArgs e)

{

   // Create the source and target DataSets

   DataSet dsOrders = new DataSet();

   DataSet dsModOrders = new DataSet();

   // Load the target data and name the table

   dsOrders.ReadXml("Orders.xml");

   dsOrders.Tables[0].TableName ="Orders";

   // Setting primary key is important becausethat is the

   // only way Merge can identify matchingrows

   dsOrders.Tables[0].PrimaryKey = newDataColumn[]

      {dsOrders.Tables[0].Columns["OrderID"]};

   // Make all the rows in the target theoriginal values

   dsOrders.AcceptChanges();

 

   // Load the source data, set table name andPK

   dsModOrders.ReadXml("ModifiedOrders.xml");

   dsModOrders.Tables[0].TableName ="Orders";

   dsModOrders.Tables[0].PrimaryKey = newDataColumn[]

      {dsModOrders.Tables[0].Columns["OrderID"]};

   // When loaded, the rows are all treated asnew/mod rows

   // If AcceptChanges is called, it sets themall to orig

   // so that the merge behavior withpreserveChanges

   // can be observed.

   if (chkSourceOrig.Checked)

   {

      dsModOrders.AcceptChanges();

   }

   // Do the merge

   dsOrders.Merge(dsModOrders.Tables["Orders"],

      chkChanges.Checked,MissingSchemaAction.AddWithKey);

   // Data bind to see the results, varyingbased on the

   // selected RowState

   dataGrid1.DataSource = new DataView(

      dsOrders.Tables["Orders"],null,null,

       (DataViewRowState)cmbRowStateFilter.SelectedItem);

}

Figure3. Calling Mergewith the same table in both source and target allows you to perform updatesfrom other DataSets. The behavior of merge depends on a combination of theRowState of rows in the source and target and whether the preserveChangesBoolean argument is passed to Merge.

 

Thesample program for the article will let you play with these options by changingthe preserveChanges parameter on the fly and displaying the resulting changeswith a selectable RowState filter. You will want to look at the source andtarget data to see which values are coming from where to understand what youare seeing at run time, but this should give you a better idea of whatcombinations are possible using the Merge method. To read about all the variouscombinations of behavior of the Merge method, see the topic "Merging DataSetContents" in the MSDN library.

 

Thesample code accompanying this article is available for download.

 

Brian Noyes is a consultant, trainer, speaker,and writer with IDesign, Inc. (http://www.idesign.net),a .NET focused architecture and design consulting firm. Brian specializes indesigning and building data-driven distributed applications. He has over 12years' experience in programming, engineering, and project management, and is acontributing editor and writer for C#PRO, asp.netPRO, and other publications. Contacthim at mailto:[email protected].

 

 

 

 

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like