ADO.NET: A Bridge to the Future
Understand the newest incarnation of ADO
May 22, 2001
Before the Web entered its programming age, data access was a relative issue for most IT managers and consultants; you had to take what you got. The primary concern was choosing the most cost-effective database server, and all the modules of an involved system had to comply with the server. Client/server applications were the typical expression of this two-tier model.
As the Web became more interactive and widely used, the need for a third, middle tier became obvious. The middle tier is a logical layer in which the data-access components usually reside. The data-access component is the only piece of code that needs to know the details of the database, and the data-access component is the first piece of code you need to modify if you have to port or upgrade your server.
Then, the three-tier model of systems grew into Windows DNA—now called Microsoft .NET Server—systems. Microsoft approached data access by using a unifying model that focused on content rather than data format and storage media. This model found its concrete expression in the Universal Data Access (UDA) strategy, the inspiring principle behind the OLE DB architecture. Microsoft designed ADO to provide easy, seamless access to OLE DB functionality through Visual Basic (VB) and Active Server Pages (ASP) COM components. ADO 2.0 was the first version of the .NET framework to support OLE DB. ADO evolved to version 2.6 within a couple of years by adding and enhancing XML support until ADO could extend its object model to match the functionality of any OLE DB enhancement (e.g., ADO 2.5 matched OLE DB's introduction of Row and Stream objects with similar ADO functionality).
ADO 2.0's key features exceed those of OLE DB. The advent of middle-tier components raised the question of how to provide fresh data to the presentation layer of multi-tiered systems. How does this presentation layer access that data? How do you open connections? Or should you manipulate disconnected record arrays (i.e., some presentation-layer records for which you dropped the connection)? ADO 2.0 and later provided both options through server cursors and disconnected recordsets (i.e., COM objects serialized over the network that you can download to the client for further offline use).
Although server cursors represent a tightly coupled and connected environment, where you always maintain open channels between tiers and drop connections only when you finish, a disconnected recordset is a stateful object that you can manipulate as a whole without needing to maintain a connection. A disconnected recordset uses a static, client-side cursor and can offer a data-source snapshot. A disconnected recordset is great for all those applications that have read-only purposes and need to move data between various system tiers. Most of today's Web applications require the transmission of data between tiers. To pull this data out, you often use fast, forward-only recordsets, encode the data in XML, then send it over the network to avoid having to maintain the session state on the Web server. (For more information about forward-only recordsets, see Michael Otey, SQL Seven, "ADO Performance Tips.") Because connections are crucial resources in a distributed environment, working disconnected ensures a high degree of scalability.
The Web, however, is a double-edged sword. The Web lets you connect to and interact with any kind of online service. However, the Web also requires a certain degree of interoperability because the involved services might run on different software and hardware platforms. You can work across heterogeneous platforms, using open standards and disregarding proprietary technology—even such a broadly used proprietary technology as COM+.
Today, Windows Web data-access applications take advantage of an ADO object's rich, convenient programming interface. However, ADO objects inherently target the Windows platform. Their COM-based nature makes ADO Recordsets difficult to use in a distributed, heterogeneous environment. In addition, whereas the target platform might let you use ADO Recordsets, they don't have the most efficient architecture. ADO.NET DataSet and DataReader are more efficient; however, if you don't have ADO.NET, sometimes you can be more efficient with plain XML or text.
Microsoft has optimized ADO Recordsets for data transmission in a Web environment, but COM type conversion remains a necessary step because COM types don't always match the ADO Recordset's data types (e.g., a String type must become a BSTR type). Thus, many people use XML as a universal glue between tiers—no matter which platforms are involved. Usually, you retrieve a recordset, save it to XML, transmit the resulting data stream, and let the receiver rebuild a recordset for further use. As interoperability and scalability increase, ADO doesn't give the best possible answer because it's not XML-based—ADO.NET is.
ADO Limitations in the Programmable Web
The .NET Framework promotes a new component model that replaces COM and COM+. The introduction of .NET is the next step in Microsoft's maturing component technology. Although several key COM features don't appear in .NET, .NET is similar in some ways to COM programming. Therefore, COM programmers should have an easy adjustment to .NET development. Microsoft specifically designed ADO.NET to be the data-access layer of the .NET Framework, and ADO.NET largely takes advantage of .NET's features.
Why does .NET need a new data-access layer to replace an existing and widely used one such as ADO? Microsoft designed .NET to meet the challenge of designing modern Web systems that retain the interactive behavior of client/server and desktop applications. In addition, .NET exploits the broad connectivity and interoperability of Web protocols.
ADO Recordsets aren't usable in native mode on non-Windows platforms, resulting in limited interoperability. To work around that limitation, you convert recordsets to XML and transmit the resulting XML recordsets. In ADO.NET the process of converting data to XML and transferring it over the network has been simplified and optimized. In addition, a database-centric data vision pervades the ADO object model. ADO sees data as a set of records from a data source, instead of as unique pieces of information. Data can't exist in ADO without the structure that the data provider uses to store or render the data.
ADO.NET DataSets and ADO Recordsets
ADO.NET is ADO revisited and improved from a Web point of view. Microsoft designed ADO.NET to be exactly what its name suggests: ADO plus .NET. ADO.NET is automatically connected to the network and is devoted to making Web data access easier and more effective. Two features enable these enhancements: disconnected recordsets and native support for XML. The absence of server cursors in ADO.NET is a natural consequence of the disconnected recordset approach. In native mode, ADO.NET stores a record table as an XML document and treats schema and data as distinct and replaceable elements.
If you think that these features aren't innovative because ADO already offers them, note that ADO.NET gives you many other options as well. ADO.NET can use connected or disconnected recordsets, depending on which cursor type and location you choose for a recordset. The ADO native storage format for a recordset is the Advanced Data TableGram (ADTG) file format, which is a proprietary, binary schema that represents a recordset's in-memory image. XML is an alternative, fixed, verbose output format. In ADO.NET, you can disconnect and natively render a recordset collection by using a default, but modifiable, XML schema.
The ADO.NET object model's principal object is the DataSet. Generally, a DataSet object is a recordset collection. The ADO.NET Framework provides all the recordset's database characteristics: sorting, paging, filtered views, relations, indexing, and primary keys.
A DataSet object represents an in-memory, feature-rich data cache. A DataSet object also organizes data in tables, each of which has no connection to the original data source. You can add tables that you obtained from a query by reading a local or remote XML file or loading the table from any accessible system resource, including memory and attached devices. You can sort, index, filter, or navigate a data table as an ADO Recordset.
You can use commands to fill a DataSet object with data collections. The same DataSet object can service multiple requests from multiple connections if you supply it with data tables in a .NET collection form. (The .NET data type that features a collection is ICollection.) A DataSet object is more generic than an ADO Recordset and, in contrast, abstracts the data source. However, a DataSet object remains a data store that works in memory; it doesn't fully replace a recordset. If you need to scroll a set of records only once and produce some sort of output, you should use the DataReader object, the .NET counterpart of forward-only, read-only recordsets. The DataReader object is a highly specialized object and, thus, lighter and smaller than a recordset. A recordset, in fact, is rather bloated because it can perform a number of different tasks. Compared to ADO Recordsets, a DataReader has no housekeeping code and no code other than the one needed to outfit its functions.
The possibility of managing multiple tables as a whole and relating them to each other is new in ADO.NET. You can persist and transmit any DataSet object as XML without any extra cost because a DataSet object is natively constructed in XML. So, you don't need to translate any part of a DataSet object to obtain an XML stream unless you want to change the underlying schema.
Highlights of ADO.NET Objects
ADO and ADO.NET have two distinct object models: ADO is for server platforms based on Windows 2000 and Windows NT; ADO.NET is for .NET-enabled platforms. Microsoft expects to ship the first .NET OS, Windows XP (formerly code-named Whistler), by the end of 2001. However, its successor (code-named Blackcomb) is more likely to provide a full-fledged .NET OS.
For migration, note that you can import your existing ADO code in .NET applications, thus saving your code investments. However, the same ADO code can hardly be ported to ADO.NET without a significant redesign. The object models you find in ADO and ADO.NET are different and follow the guidelines of different design centers.
ADO.NET is only for building Web systems based on a .NET-enabled server. ADO.NET is the data-access API of .NET applications. So, you should consider ADO.NET only if you plan to upgrade your servers to .NET. Having ADO and ADO.NET cooperate in the same application doesn't make sense. Although you can use both, at least in terms of design, it's not a good idea.
ADO.NET has several primary objects: DataSet, DataTable, DataColumn, DataRow, and DataRelation. Their main features are as follows.
DataSet. This object is a collection object that can contain any number of data tables plus all the tables' constraints, indexes, and relations. All this information is in XML, and you can process, traverse, and search any or all of the data. Figure 1 illustrates a typical DataSet object's schema; in this case, the DataSet object contains two tables, one from SQL Server and one from an Oracle database. The two tables connect through a relation that associates a group of source-table rows with a group of target-table rows (e.g., a master-to-detail relationship). In addition, an XML table relates to the Oracle table in a one-to-one (1:1) model.
DataTable. This object represents all the tables you might find in a DataSet object, as Figure 2 shows. You use the Tables property to access a collection of DataTables. Likewise, the DataSet's Relations property accesses the collection of all the established dataset relations. The Xml property shows the object's native XML representation. A .NET application can load the Xml string to rebuild the dataset.
Within the ADO.NET hierarchy, the DataTable object most closely maps to the ADO Recordset object. You can create and use tables inside or outside a dataset, depending on your specific goals. You can run commands manually—you must define the table's schema first—or against managed data providers to create and fill the tables (for more information about managed data providers, see the sidebar "Managed Providers").
DataColumn. A table schema contains column-specific information, including name, type, and attributes. You just create a new DataColumn object, specify the data you need, and add the column to the table:
Dim dc As DataColumn dc = New DataColumn()dc.DataType = System.Type.GetType("System.String")dc.ColumnName = "NameOfTheColumn"
The column list is available at any time through the DataTable's Columns collection.
DataRow. To fill a table, you can use commands' automatic data-binding features or add rows manually by creating and inserting a DataRow object into the table. Then, you populate the rows' fields with data. You can navigate the DataTable's elements through the Rows collection, implement a sequential navigation model by using Rows, or jump to individual records through search or direct positioning.
DataRelation. This object represents a parent-child relationship between two tables. Columns, which must have the same data type, establish relationships. Columns don't have precise cardinality and can be 1:1, one-to-many (1:M), or many-to-many (M:N). Relationships can also easily cascade changes from parent rows to children, although this behavior isn't the default.
To enable the DataRelation object, you need to add a ForeignKeyConstraint to the ConstraintsCollection member of the data table you plan to change. The ConstraintsCollection of the DataTable object determines what happens when a value in a parent table is deleted or updated.
After you set up the relationship, ADO.NET disallows any dataset change that would break the relationship and creates a runtime exception. While walking through a table's rows, you can call the GetChildRows method to access all the related rows from a connected table. The GetChildRows method returns an array of DataRow objects, an alternative navigation type that follows a hierarchical model, instead of the traditional sequential or random-access model.
Relationships aren't transitive. Suppose that Table A relates to Table B and Table B relates to some rows in Table C. Further, suppose that you're scrolling through the records in Table A and for each record, you access the child rows in the existing relation. If you also want to access the child rows in Table C for a given row in Table B, you can't use the DataRow object on Table B that the A-to-B relation provides. Instead, you must open a new table view from Table B, locate the specific record you want, then call GetChildRows through Table B's relation with Table C.
The code in Listing 1 shows you how to create and fill a DataSet object by accessing the standard Northwind database in SQL Server. As you saw earlier, Figure 2 provides a quick view of the hierarchy below the DataSet object. After you create a DataSet object, you open a connection to the chosen database and execute a command that returns records. This command can be one SQL statement, as well as a stored procedure. The following code shows the FillDataSet method, which lets you associate with the specified dataset the table that the command returns:
Dim oDS As New DataSetoCMD.FillDataSet(oDS, "EmployeesList")
Each table in a dataset has a name that you use to retrieve the content. ADO.NET's real step forward comes with dataset creation. ADO.NET's DataSet object has no ADO counterpart. Although this example might fail to show the DataSet object's importance, architecturally speaking, the DataSet object is at a different level. The possibility of managing more tables at once under the same logical object's umbrella provides unprecedented power for implementing master-detail schemas and data-bound architectures. You can also reference a table by index.
Modifying Data
When working disconnected, you modify data by adding, modifying, or deleting rows from an in-memory cache. So, if you add a new record to a DataTable by using NewRow or edit the values of an existing record with a simple assignment, the changes occur in memory and don't affect the underlying data source. The update occurs when you call the Update method from a Command object to submit the changes. The Update method calls the respective insert, update, or delete commands for each inserted, updated, or deleted row in the dataset.
You specify data-source-specific commands to insert, update, or delete records by using the ADO.NET Command object's InsertCommand, UpdateCommand, and DeleteCommand properties. The content of these properties depends on the specific data provider you're targeting, but if the data provider is a database management system (DBMS), the content is likely to be SQL strings. During the Update method's execution, if any of these properties isn't set but the DataSet object contains primary key information, ADO.NET automatically generates the command text.
I recommend using this insert, update, and delete approach only when the application requires that you load a certain number of records and apply changes in an unpredictable order and number. If the UI merely requires you to populate some fields, then insert or update a record, executing a SQL statement or launching a stored procedure is preferable.
Other Considerations
Although ADO.NET isn't a particularly huge topic, this article only scratches its surface. You can't effectively use ADO.NET without first understanding the rationale behind it. ADO.NET isn't merely another API for accessing data; ADO.NET lets you access data across the Web while you leverage the power of the .NET platform. If you choose .NET for your server, you should adopt ADO.NET for data access.
When designing the .NET platform, Microsoft had to decide how to handle data access. The company could have ported ADO as is, but ADO is COM-based and mostly used as a script interface on top of OLE DB, a tightly coupled and connected environment. Microsoft modified the ADO model to make it less database-centric and more able to work in a loosely coupled environment. In ADO.NET, Microsoft made ADO consistent with the .NET platform. Whether you write Web forms, Windows forms, or Web services, if you have a data source in your system, ADO.NET always presents the same set of classes. ADO.NET revisited the Recordset object and split its functionality into two objects: DataSet and DataReader. If you decide to stick to ADO, you can still use .NET to import ADO classes. However, ADO provides nothing that ADO.NET can't do.
Another important consideration concerns the way that Microsoft designed ADO.NET. In both ASP.NET and ADO.NET, most changes reflect programming best practices. Microsoft simply inserted the changes into a consistent object model and an all-encompassing framework where the individual programming trick emerges as an integrated functionality. As the offspring of common best practices and common user requirements for data access, ADO.NET qualifies as an absolute must if you want to upgrade your Web systems to .NET.
About the Author
You May Also Like