The Entity Framework in Action

Model applications using entities and relationships

Dino Esposito

May 5, 2010

22 Min Read
DataEF Microsoft Visual Studio screenshot

Since its inception, the ADO.NET Entity Framework (EF) has aroused interest and discussion. At first, Microsoft's endeavor to provide framework support for entity-based architectures was received with enthusiasm. The enthusiasm fizzled, though, with the actual release of EF 1.0, when many prominent members of the developer community voiced their concern that it didn't adequately address what they needed for building entity-oriented projects. This half-hearted welcome triggered a significant review of the EF structure and implementation. As a result, a new version, labeled as EF 4 (to match the latest version of the.NET Framework), is shipping with Visual Studio 2010.

So, is there anything good in EF 4? I think so, especially for SQL Server developers. Knowing what EF is about and having an idea of how to use it can help you decide whether you want to pursue using EF.

Making Sense of EF

For years, developers built applications to manipulate data in a format that was really close to the native relational format—essentially tuples of data mimicking original records. The complexity of modern software projects raised the architecture bar to a level that questioned the practice of manipulating data in a relational format. The need for tools that enabled a more conceptual view of application data grew enormously, which captured the attention of Microsoft. EF was the result of Microsoft's effort to meet this need.

EF is a framework to model applications using entities and relationships. It lets developers create a model for their logical data, then map that data to a set of relational tables in a classic database management system (DBMS).

The main points of controversy concerning EF 1.0 were the structure and design philosophy behind the model and its constituent entities. In a pure entity-oriented architecture, an entity is a central component of the design. It holds both data and behavior. Furthermore, the entity is completely agnostic of persistence and storage. In EF 1.0, entities are perceived more as data objects, are seen primarily from the perspective of data storage, and have tight dependency on the EF infrastructure. In other words, entities are not persistence-ignorant and not plain-old CLR objects (POCOs).

In EF 4, a lot has changed. Entities can be optionally created as POCOs. More important, EF 4 lets you build an entity model using a bottom-up approach (build an entity model from the structure of a database) or a top-down approach (build an abstract entity model, then map it to existing or new databases). EF 4 also offers to generate a Data Definition Language (DDL) script to create the best relational database the model would fit in.

In my opinion, the bottom-up approach is optimal for developers with a strong SQL Server and database background. That way, they can select a database, choose the tables, views, and stored procedures they want to include, then ask EF to generate an entity model. This approach can progressively raise their vision from the physical database structure to the heights of object design. They can use EF for high-level data access tasks, letting the tool create any necessary SQL code. Their database skills, though, will continue to play a key role because the database still needs to be well designed and optimized.

Building an Entity Model

Now that you have an idea of what EF is, let's look at how to use the bottom-up approach to build an entity model in an application. Note that choosing the bottom-up or top-down approach is an architectural choice based on a project's specific needs. In no way is the bottom-up approach always better or worse to use than the top-down approach.

The first task is to create the Visual Studio project needed by the application. You can use EF with any type of project, no matter whether it's a Windows Form, Web, Windows Presentation Foundation (WPF), or service project. For the example in this article, I created a C# Windows Form project named DataEF in an application that's also named DataEF. You can download the DataEF application by clicking the Download the Code Here button near the top of the page.

When it comes to EF, a key guideline is to isolate the EF-related components in a separate class library. So, after you create the Visual Studio project, you need to add a new class library project to the application. The class library project needs include the ADO.NET Entity Data Model item, as Figure 1 shows.

 logo in a gray background |

The ADO.NET Entity Data Model is persisted to an .edmx file that contains the entity model itself and the elements used to graphically represent the model in Visual Studio's Entity Designer. You use the Entity Designer to select and change the entities you want to include in your model. For this example, I selected a few tables from the classic Northwind database. Figure 2 shows the diagram for the resulting Northwind entity model.

 logo in a gray background |

By default, the connection string to the Northwind database is stored in the configuration file (app.config) of the model project. This configuration file (or at least its connection-string-related content) must be available in the context of the main project when you compile it, so you need to move it there. So, in this example, you'd move the app.config file that's under Entities to the DataEF main project in Figure 2. If you already have a configuration file in the main project, you need to merge the two files' contents.

The code for the entities is automatically generated by Visual Studio and saved to the appropriate designer file (Northwind.Designer.cs in this example). This file shouldn't be edited. You use the classes defined in the designer file to code against the EF model and to execute Create, Read, Update, Delete (CRUD) operations.

Understanding Entity Model Assemblies

 logo in a gray background |

Understanding the structure of entity model assemblies is helpful when building entity models. You can use an explorer tool such as Red Gate's free .NET Reflector to see the structure of an entity model assembly. For example, Figure 3 shows the Entities assembly.

An assembly contains one class for each entity in the model, plus the gateway class that coordinates access to the entities. Thus, the Entities assembly contains four classes: one class for each Northwind table selected and the gateway class. The name of the gateway class usually consists of the entity model's name trailed by the word Entities (NorthwindEntities in this example). The gateway class exposes collection properties for each configured entity. In addition, it inherits methods and properties from the system-provided ObjectContext class. This class lets you track and replay changes to the model, create queries, and manage the underlying connection to the database.

The assembly also includes resources, which consist of three XML documents:

  • A conceptual schema definition language (CSDL) document, which describes the entities, relationships, and functions that make up an entity model. Typically, the developer owns the CSDL file.

  • A store schema definition language (SSDL) document, which describes the storage model in terms of the physical tables involved and their columns. Typically, the administrator owns the SSDL file.

  • A mapping specification language (MSL) document, which describes the mappings between entities and storage. Typically, the developer and administrator share ownership of the MSL file.

These XML files are usually embedded in the assembly. However, by changing the value for the Metadata Artifact Processing property in the Entity Designer's Properties pane, you can save them as individual files in the output directory. This way, editing storage schema and mappings can be done without using Visual Studio and in parallel.

By default, entity classes match columns in the mapped table and SQL Server types are adapted to .NET types. You can remove or add properties in the Entity Designer's Properties pane or in the XML file (if kept out of the assembly). No logic is found in entity classes. Instead, they feature a factory method, typically a CreateXXX method (where XXX is the name of the entity), and get and set properties. In addition, navigation properties are added to reflect relationships between tables at the database level. For example, the Customers class will have an Orders property that returns related orders. Listing 1 shows a code snippet from the Customers class.

public partial class Customers :System.Data.Objects.DataClasses.EntityObject {   public static Customers CreateCustomers(string customerID,string companyName) {     Customers customers = new Customers();     customers.CustomerID = customerID;     customers.CompanyName = companyName;     return customers;}  public string CustomerID   {     get { return this._CustomerID; }set { ... }   }:   public System.Data.Objects.DataClasses.EntityCollectionOrders  {        :     } }

Note that EF 1.0 doesn't support pluralization of names, and it uses the table name for a related entity. This explains why you end up with a Customers class to represent an individual customer. Pluralization is supported in EF 4 as an optional feature. This means that you can port existing projects to EF4 without the need for renaming entities.

Each entity class inherits from the system-provided EntityObject class, and each navigation property returns an EntityCollection<T> object (where T is the generic type of the elements in the collection) instead of a plain .NET list object. A tool like EF needs to carry some extra information (known as shadow information) about properties in order to correctly persist them to a database. EF 1.0 packs this extra information into a given base class that all entities derive from; the developer community argued that this didn't provide persistence ignorance. EF 4 still supports this approach but also adds other code generators. In particular, if you decide to create the entity model using the top-down approach, you just instruct EF 4 to create classes for the entities and their relationships after you've created the model. In doing so, you can choose the classes' characteristics.

As I mentioned previously, entity classes don't include any logic. You can add logic to an entity class by adding a new class file to the project and defining a partial class in it. The class's name and namespace must match those of the entity class you want to extend. For example, the code in Listing 2 adds a method to the Customers class. This class returns only the orders issued in the specified year.

public partial class Customers{  public EntityCollection GetOrdersByYear(int year)  {    var filteredOrders = new EntityCollection();    foreach(Orders o in this.Orders)    {  if (o.OrderDate.HasValue && o.OrderDate.Value.Year == year)  {    filteredOrders.Add(Entities.Orders.CreateOrders(o.OrderID));  }    }    return filteredOrders;  }}

When you add logic to an entity, it's important that you don't lose perspective on what logic belongs in the entity and what logic belongs in the application. You shouldn't be accessing the database, running queries, accessing the cache, and similar actions from the logic associated with an entity method. Entity methods should only express the behavior of the entity in relationship to and with other entities in the model. This is a key point to consider when architecting your EF application. The idea behind this recommended practice is that the entity model represents the entities operating in the business domain. Persistence is a separate concern that EF will take care of. Architecturally speaking, this leads to creating two distinct modules—entities with domain logic and services with application logic. Domain logic is the behavior you expect to find in an entity, such as the logic that calculates the estimated payment date for an invoice. Application logic is the behavior that the application needs to implement; this behavior is formalized in a use case.

Querying Operations

After your EF application is built and compiled, you can perform query (i.e., read) operations. Listing 3 contains the source code for a query that retrieves the customer (if any) that has an ID of ALFKI.

// Define the query to run.string command =  "SELECT VALUE customer FROM NorthwindEntities.Customers AS customer" +  "WHERE customer.CustomerID='ALFKI'"; // Get an instance of the EF gateway object.var context = new NorthwindEntities();// Create the query object.var query = new ObjectQuery(command, context);// Execute the query.ObjectResult result = query.Execute(MergeOption.NoTracking);Customers alfki = result.Cast().FirstOrDefault();

The first command defines the query string. Although it might look like plain SQL, it's actually a special SQL dialect specific to the EF platform called the Entity SQL language. In brief, the Entity SQL language offers a SQL-like syntax to query the abstract database represented by the entity model. Instead of table names, you use the names of entity collections; instead of column names, you use property names. (You can read about the Entity SQL Language at msdn.microsoft.com/en-us/library/bb399560.aspx.)

The second command sets up the logical connection to the entity model by getting an instance of the NorthwindEntities object. No physical database connection is really opened at this time. However, the command is logically equivalent to the command that opens a connection in traditional ADO.NET code. The main point of using EF is that the entity model is the database in the application's eyes. Getting an instance of the model's gateway represents a way of opening a connection to the model.

The next command creates a strongly typed query object. This object will execute the specified query against the pseudo-connection represented by the entity gateway object.

Finally, the query is executed. There are many ways to trigger the execution of a query. The most generic method is Execute. This method requires a MergeOption argument to specify how to deal with the values returned by the query. The result set is wrapped up in an ObjectResult object and can be cast to an IEnumerable type. If the expected result of the query is a single object, you use the FirstOrDefault method to extract the first object or default to null.

After the query runs, you have obtained an instance of a persistent entity object—that is, an entity object populated with data read from the database. If you're curious about the real SQL code that runs against the database, Figure 4 shows what SQL Server Profiler captured.

 logo in a gray background |

Using LINQ to Entities

A query can also be expressed using a specific flavor of the Language-Integrated Query (LINQ) language called LINQ to Entities. LINQ to Entities is a higher-level replacement for Entity SQL. (You can read about LINQ to Entities at msdn.microsoft.com/en-us/library/bb386964.aspx.)

Listing 4 shows how to write the previous query in LINQ to Entities.

var context = new NorthwindEntities();var query = from c in context.Customers  where c.CustomerID  "ALFKI"  select c;var alfki = query.FirstOrDefault();

As you can see, some steps are unchanged, including the extraction of actual data and instantiation of the gateway object. Note that for the LINQ to Entities query to compile in Visual Studio, you need to add the following namespace:

using system.linq;

The main benefit of LINQ to Entities is the simplified syntax you can use to write even complex queries. For example, here's a query that returns a list of customers who have placed more than 10 orders:

var query = from c in  context.Customers  where c.Orders.Count > 10  select c;

However, if you programmatically loop through the returned customer collection, you'll find that no order information is available because the query incorporates only customer records in the result set. In other words, you didn’t tell EF to fetch orders. When data from multiple entities needs to be incorporated into a single result set, you must explicitly configure a fetch plan for EF. Alternatively, in EF 4, you can turn on lazy loading.

If you want the result set to contain all customers and related orders, you need to rewrite the query as

var query = from c in  context.Customers.Include("Orders")  where c.Orders.Count > 10  select c;

You use the Include method to define a fetch plan, which basically specifies the dependencies to be fetched and added to the involved entities. For example, in the rewritten query, the Include method guarantees that every customer entity being fetched also includes its own list of orders. In EF jargon, using the Include method is also referred to as using eager-loading capabilities.

Eager loading is different than lazy loading. Lazy loading entails that the entities retrieve data for their related properties (e.g., orders for a customer) only when it's requested. The two features can be enabled simultaneously for a given context.

In EF 1.0, you don't have automatic lazy loading. Lazy loading in EF 4 is turned off by default and can be enabled via the following code:

context.ContextOptions.DeferredLoadingEnabled = true;

Note that lazy loading works only for navigation properties that represent a relationship, such as Orders for a Customer object and OrderDetails for an Order object.

Performing Other CRUD Operations

At its core, EF is an object relational mapper whose entry point is the aforementioned gateway object that Visual Studio creates (the NorthwindEntities class in the current example). This object plays a role in any CRUD operation. You've already seen how it works in the context of read (i.e., query) operations. The work it does in create, update, and delete operations is even more crucial.

The gateway object—often referred to as the data context object—implements the unit-of-work pattern, which means that the data context object keeps track of any operations occurring on all entities associated with a given instance. All insertions, updates, and deletions recorded by an instance of the data context object can then be played back to the real data source and permanently stored. A unit of work is a logical transaction that occurs against the entity model. The data context object takes care of mapping all operations within the logical transaction to a physical database transaction.

Let's look at how to insert, update, and delete objects. As you'll see, performing CRUD operations with EF isn't much different from using the batch update feature of ADO.NET data sets.

Inserting New Objects

There are two ways to insert a new object, depending on whether the object being added is a child entity or root entity. To insert a new child entity, you need to create an in-memory transient entity, attach it to its parent entity, then call SaveChanges on the data context object to persist all pending changes. For example, Listing 5 shows sample code that adds a new order.

// Add a new transient customer. var newCustomer = Entities.Customers.CreateCustomers("TEST", "Fictitious Co."); :context.AddToCustomers(newCustomer); // Create a new transient order for that customer. var newOrder = Entities.Orders.CreateOrders(1234); newOrder.OrderDate = DateTime.Now; : // Link the order to a parent customer entity. newCustomer.Orders.Add(newOrder); // Persist pending changes in both tables.context.SaveChanges(); 

This code adds a new transient customer object, then uses that customer object's Orders property (a navigation property) to create a new transient order object. Next, the code links the order object to its parent customer object. When SaveChanges is invoked, both the Orders and Customers tables are updated.

You use a slightly different approach if you're going to add a root entity that's not directly involved in a relationship. Instead of adding the new object directly to a parent object's navigation properties, you use the AddToXxx method (where Xxx is the name of the entity, such as Customers or Orders) to add the root object of an aggregate. The Entity Designer in Visual Studio automatically adds an AddToXxx method to the data context object for each supported entity. For this example, you have the AddToCustomers and AddToOrders methods.

Updating Objects

For changes to be tracked and transactionally applied via a call to SaveChanges, you must enter changes exclusively on entity objects bound to the data context object. An entity object is bound to a given data context object if it's been retrieved by a query run through that data context object. In addition, an entity object can be explicitly attached to a data context object. Note, though, that an entity object is never allowed to be attached to two data context objects at the same time.

The necessity for an entity object to be bound to the data context object that applies changes is a strong one. This means, for example, that you should first retrieve the object to update from the source, apply changes in memory, then submit the changes back to the data source. It requires two roundtrips instead of just one. Most of the time, this is unacceptable.

So, let's suppose that you have a customer object cached somewhere. Your operation consists of creating a new order object and adding it to this customer object. This is a pretty common scenario in a Web application, where you first load customer details in a postback, then you modify some properties in a subsequent postback.

In this case, you use two distinct data context objects to retrieve and update the entity. As the code in Listing 6 shows, you first need to retrieve the object you want to update from the cache, then attach it to a newly created data context object. Next, you proceed with the changes and save them.

// Retrieve the entity from the cache. // (The object was created through another context // and was placed into the cache after being detached// from its native context.) var mands = GetCustomerFromCache(...); // Attach the entity to a freshly created context. var anotherContext = new NorthwindEntities(); anotherContext.Attach(mands); // Update the City property of the customer object. mands.City = "New York"; // Add a new order to the customer object. Orders newOrder = Entities.Orders.Create(1234); : mands.Orders.Add(newOrder); // Save the changes. anotherContext.SaveChanges();  

It's essential, however, that you use the Detach method to detach the entity from its original data context before you place the entity into the cache. In an n-tier scenario, you'd use the code

originalContext.Detach(mands);

 To handle updates that are strictly related to optimistic concurrency, you can define a ConcurrencyMode property on each property of an entity. By default, the concurrency mode is None, which means that no properties participate in the optimistic concurrency. Properties for which you change the value to Fixed will be automatically added to the WHERE clause of any update statement generated by EF. It's advisable that you add a Version property and mark only that as concurrency sensitive. (For more information about optimistic concurrency, see msdn.microsoft.com/en-us/library/bb738618.aspx.)

Deleting Objects

In the SQL world, deletions are the simplest of all operations because all you need to know is the ID of the record to delete. In EF, however, you delete objects, not records. This makes a difference.

The canonical way to delete an entity in EF is to pull the entity into the context, then delete it. You use the DeleteObject method to delete an entity in code such as

var mands = ...;context.DeleteObject(mands);

The first line pulls the entity into the data context. The second line marks the object for deletion. Alternatively, if you know the ID of the entity to delete, you can use the code

Orders stub = new Orders \{ ID = 4 \};context.AttachTo("Orders", stub); context.DeleteObject(stub); context.SaveChanges();

Both code snippets will fail if you attempt to delete an entity with dependents. For example, you aren't allowed to delete a customer if the customer still has orders. You could solve this problem by adding a cascade rule to the relationship in the database. If you build the EF model from a database where this rule is set, EF will require you delete only the parent object and ensure that any child objects are deleted as well.

You could also solve this problem by explicitly deleting the parent object and its child objects. This step is trivial if you can pull entities into the data context. It's much less trivial if you work with detached objects. The Detach method described earlier doesn't support graphs of objects. This means that if you detach, say, a customer, then all related orders are just lost and the orders collection is empty. This would make it hard to retrieve orders you want to delete; you'd have to first load them from the source.

A better approach is to use a workaround that involves setting the NoTracking merge option on the query when the customer is first retrieved. Listing 7 shows this approach.

var context = new NorthwindEntities();var query = (from c in context.Customers.Include("Orders")  where c.CustomerID == "MANDS"  select c) as ObjectQuery;// Execute the query, specifying NoTracking, which returns// a detached graph.ObjectResult res = query.Execute(MergeOption.NoTracking);Customers mands = res.Cast().FirstOrDefault();// Loop through the Orders collection, making a copy of the// order entities to delete.var ordersToDelete = new List();foreach (Orders o in mands.Orders)    ordersToDelete.Add(o);// Mark the orders for deletion.foreach (Orders o in ordersToDelete)    anotherContext.DeleteObject(o);// Mark the customer for deletion.anotherContext.DeleteObject(mands);anotherContext.SaveChanges();

After executing the query that includes the NoTracking merge option, the code loops through the Orders collection of the customer, making a copy of the order entities to delete. Finally, the code marks the orders, then the customer for deletion.

Should You Use EF?

When I was covering the EF basics, you might have gotten the gist that using EF isn't easy and you might be wondering whether it's worth the pain of learning. When making this decision, you need to keep a couple of things in mind.

First, you need to realize that EF isn't simply a tool that replaces SQL. Although it generates SQL code for you, it assumes you're working and thinking in terms of objects. When you switch to EF, the entity model is your database. CRUD operations are done against the entity model using the rules set by EF. So, queries are typically much easier to write. Create, update, and delete operations, though, require a bit more attention, especially in multilayer solutions.

Second, you shouldn't use EF as you would use ADO.NET. Although EF lets you execute plain SQL commands and stored procedures, doing so would likely waste any investment you have made in upgrading your applications toward a conceptual data model.

Read more about:

Microsoft
Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like