LINQ Deep Dive: Part II

(May 2009 Issue)

Zeeshan Hirani

October 30, 2009

21 Min Read
ITPro Today logo in a gray background | ITPro Today

We discussed in the first part of this series someof the different behaviors of LINQ queries in different domains, such as LINQ toObjects and LINQ to SQL. LINQ to SQL interprets the query differently andsometimes returns different results as compared to when you execute the queryusing LINQ to Objects. The behaviors of a LINQ query in different domains cansignificantly impact performance, output, and memory consumption.

We discussed in Part I how to apply joins andfilter on a LINQ to Objects query in an efficient way to yield maximumperformance. We also learned how to delay load a particular column in a tableand fetch the column only when it is necessary so you don t incur the cost ofbringing a column with huge content, such as a binary or varchar(max) column.We ll continue our exploration in this second installment by introducing usersto the caching feature offered by the object tracking service and examining onwhich scenarios caching is used and how it ensures object referenceconsistency. We ll also discuss how LINQ to SQL automatically maps resultsreturned to entities from dynamic SQL, as well as how a non-default constructorcan prevent composing queries from other queries. We ll also learn tounderstand and identify if a query is being sent to a database for execution orif it is being applied in memory.

Tip One: Query Based on a Primary Key Optimized for Database Roundtrips

LINQ to SQL has its own caching mechanism that ituses to ensure optimistic concurrency. If you retrieve an object from the DataContext,the LINQ to SQL object tracking service starts tracking the object. Thetracking service caches the object based on a primary key. The cachingmechanism is not actually there to improve performance of the query, but ratherto provide consistency with the original results you obtained; for instance, ifyou retrieved a customer object from a DataContext and modified a fewproperties on the object. This makes the customer object dirty in memory, andmarked for update. However, the update has not been sent to the database yet.If you were to query for the same customer again on the same DataContext, youshould get the object you modified, not a fresh copy from the database (whichwill not have your changes). If LINQ to SQL did not maintain a cached versionof the object, you could end up losing the changes if you queried for theobject again.

Not only does caching behavior offer consistencywith the results, but also you get consistency in terms of object references.For example, when you retrieve the customer initially, the object referencegiven by the DataContext would be the same no matter how many times you querythe DataContext for the same customer again. Getting the same object backoffers a clean programming model and helps avoid bugs introduced in theapplication because of object comparisons that would not be valid if LINQ toSQL did not give us back the same entity on which we originally worked.

Does that mean we do not incur the cost of a databaseroundtrip if we query for an object that initially had been fetched from thesame DataContext? The answer is, it depends on how you are querying. If youquery based on any column other than the primary key, regardless of which LINQquery operator you use, LINQ to SQL will make a database call. When it comes togiving you back the object, LINQ to SQL will check in the object trackingservice to see if it is tracking the object that was returned from the queryexecuted based on the primary key. If it finds an object in the trackingservice that has the primary key as the one returned from the query sent to thedatabase, LINQ to SQL will give you the object from the tracking repository.This means you ll still incur the database hit, regardless of whether LINQ toSQL has fetched the object initially and is tracking it in its trackingrepository. However, if you query using a Single or First operator using a primarykey, LINQ to SQL will first check in its object tracking service to see if itcan find an object that has the same primary key on which you are searching. Ifa match is found, LINQ to SQL will immediately return the object withoutbothering to go to the database.

Figure 1 shows I used the Single operator toretrieve the ALFKI customer based on the primary key column. The next time Iquery for the ALFKI customer using the Single operator based on the primarykey, LINQ to SQL won t go to the database, but simply will return the objectfrom the tracking service. The First operator used with a primary key columnoffers the same benefit; it, too, simply returns the object from the trackingservice. The last query in Figure 1 is what makes the second database call.Despite the fact that we are using the Single operator, LINQ to SQL makes acall to the database. Because the contact name of Maria Anders belongs to the customerID of ALFKI, and object services was tracking the ALFKI customer, you end upwith the same customer reference. This confirms our behavior that if you don tquery based on a primary key, LINQ to SQL will always make a call to thedatabase. On our output window we are printing the test to see if all fourqueries return the same object reference; indeed, the result confirms that allqueries got back the same object.


Figure 1: Single and First operators do not go backto the database if the object has been initially fetched

Tip Two: Order Lazy-loaded Child Collections on the Database

If you ve worked with other OR mappers on themarket, you must ve come across this issue several times. One of theconstraints I ve encountered in some OR mappers is that there is no clean wayto define how to sort child collections based on a certain column. For example,if I have a customer instance in my hand and I want to get access to its orders,I simply can navigate to the Orders property of the customer. But what if Iwant those orders to be sorted by ShipCity? In LINQ, you can apply the Order Byoperator on the Orders collection for the customer. But does that Order Byoperation get executed on SQL Server? No. As soon as we access the Ordersproperty of the customer, LINQ to SQL fetches all the orders for the customer inmemory. From there on, any operations you perform will get executed in memory.

Figure 2 shows that orders for the ALFKI customergot sorted based on ShipCity. However, looking at the SQL capture, it sapparent that the order operation was done in memory because our SQL statementdoes not include any Order By operations. By accessing the Orders collection inthis fashion, you don t get a chance to give any hints to LINQ to SQL that youwant the orders to be sorted by the ship city on SQL Server.


Figure 2: An Order By operator on the Orderscollection gets applied in memory and is not very efficient

Filtering child collections in LINQ to SQL can beaccomplished by using the AssociateWith method of the DataLoadOptions class.The AssociateWith method takes in a lambda statement that not only can definehow the child collection needs to be filtered, but also specify how the childcollection needs to be sorted. Figure 3 uses the AssociateWith method to sortthe orders for a given customer by ShipCity. To confirm the sort, ShipCity wasapplied at the database level; the SQL capture is also recorded, which includesthe Order By clause for ShipCity.


Figure 3: Confirms the sort order gets executed bythe database

The AssociateWith method is only available in LINQto SQL. If you are going to be using Entity Framework, you can use the CreateSourceQuerymethod available on Entity Collection and Entity Reference. CreateSourceQueryreturns an ObjectQuery, which gets converted to SQL and is sent to thedatabase. If you want your child collections to be loaded differently, such assorted in a different order or apply a filter, you must access the ObjectQueryinstance that is responsible for loading the child subcollection, then modifythe query to add ordering before ObjectQuery gets executed. Figure 4 shows anexample of applying to a child collection a sort operation using EntityFramework.

Figure 4: Applying a sort on the Orders collectionfor the ALFKI customer using Entity Framework

Tip Three: Avoid Non-default Constructors in LINQ to SQLQueries

You must use an object initializer if you are goingto be composing queries from other queries and want the entire query to beexecuted on the database. LINQ to SQL cannot convert non-default constructorsinto SQL that SQL Server can understand. Therefore, it is preferred that youuse non-default constructors to ensure that the entire query is sent to thedatabase and no part of the query is performed in memory. If the query uses non-defaultconstructors in projection, when you try to apply further composition such asadding a where clause, you ll get a runtime error saying the translation is notsupported.

Figure 5 shows I joined products, categories, andsuppliers and projected the output to the ProductSummary class. Notice I used anon-default constructor of the ProductSummary class that takes CategoryID andSupplierID as the constructor parameters. Further, I filtered the projectedresults by CategoryId of 1. Upon iterating the final query, we get an exceptionsaying the query is not supported. The reason, as we discussed earlier, is theuse of a non-default constructor, which cannot be translated in SQL syntax by LINQto SQL.

Figure 5: Using non-default constructors raises an exceptionif you try to further compose queries from the original query

What if you get an API that has no defaultconstructor and you are forced to use the class with its non-defaultconstructor? In that case, you can force the rest of the query to be executedin memory by using the AsEnumerable operator available in LINQ. As shown in Figure6, I m using the AsEnumerable operator before I apply the CategoryId filter.This hint tells the compiler that any query operation performed afterAsEnumerable must be done in memory, thus avoiding the non-default constructorexception.

Figure 6: Using the AsEnumerable operation ensureswe don t get an exception in our LINQ query

If you want your entire query to be sent to SQL Server,including your category filter, you must avoid using a non-default constructor.Figure 7 shows the correct version of the query, which, in its entirety, getsexecuted by SQL Server (as confirmed by the SQL capture). Notice we are notcalling any constructor that defaults to using a default constructor with noparameters.

Figure 7: Entire query gets executed in SQL Serverbecause we are using a default constructor

Tip Four: Eager Load a Self-referencing Table Using LINQ to SQL

There are many places where you need to representyour table records in a hierarchy. A real-world example would be where anemployee has a manager. Both the employee and the manager are stored in thesame table. However, to find out the manager for an employee, you would have anextra column, like ManagerId, that represents the ID of the employee to whom a particularemployee reports.

Another example would be an e-commerce databasewith categories that each have subcategories. To identify the parent for acategory, you would have a column, like ParentCategoryId, which represents the parentcategory for a category. If ParentCategoryId is null, then that category isconsidered the top-level category. If you have a situation where you need toload a category and all its subcategories any level deep in one single databasecall, there is no way to get that working in LINQ in a clean fashion. Youcannot use DataLoadOptions to eagerly load a self-referencing table. If youwere to try, you would get an exception saying cycles not allowed.

Figure 8 shows the database diagram for theCategory and Products tables. Notice I have a category relationship pointingback to itself, which merely describes that each category has subcategorieswithin it. The Category table also has a one-to-many relationship with Productsbecause products belong to a particular category. In Figure 8, we retrieve top-levelcategories by applying a filter where the parent category instance is null. TheC# syntax of comparing objects to null references is translated by LINQ to SQLto a null comparison of a foreign key column.


Figure 8: Eagerly loading a self-referencing tableraises an exception because LINQ to SQL does not allow loading cycles

To eagerly load the first-level subcategories of acategory, I used DataLoadOptions and passed in the lambda expression to loadthe Categories navigation property. I got an exception saying LINQ to SQL doesnot support loading cycles when I execute the LINQ query. What it essentiallyboils down to is you cannot eagerly load a self-referencing table. From afunctionality point of view, I can understand why LINQ to SQL cannotarbitrarily load any level-deep hierarchy. But most of the time, in real-worldscenarios, we are aware of the depth we d like to go, and it would be nice toallow recursive queries with a depth level specified.

Because LINQ to SQL has limitations where it won tallow you to eagerly load a self-referencing table, developers are required tomanually load recursive queries by explicitly traveling each tree of thecategory. Figure 9 shows a large query for eagerly loading two-level-deep subcategoriesfor each parent category. I obtained my top-level category by only retrievingcategories that don t have a parent. To load children for the category, I accessedthe navigation property, Categories, for each category. For instance, I obtainmy first-level subcategories by cat0.Categories, second level bycat1.SubCategories, and so on. This kind of query can potentially get complexif you have a deep hierarchy. The query produces one large SQL statement thatmay be very expensive to execute, so use it with caution and use it only whenyou want to eagerly load child entities for a self-referencing table.

Figure 9: Manually traveling two levels deep foreach parent category

Entity Framework supports the Include method,which, when passed in QueryPath allows eager loading to a predefined depth. Forinstance, in Figure 10, to load two-level-deep categories, the Include methodwas called on Categories with a QueryPath of Categories.Categories. QueryPathis comprised of navigation properties; the number of times the navigationproperties are traversed indicates the desired depth.

Figure 10: Using Include to specify the depth inour LINQ to SQL queries

Tip Five: Use AsQueryable with LINQ to Objects and LINQ to SQL

The AsQueryable method allows a query to beconverted to an instance of IQueryable. When you use the AsQueryable operatoron an existing query and apply further transformations, such as applying afilter or a sort order, those lambda statements are converted to expressiontrees. Depending on the provider you are using, expression trees will beconverted to the domain-specific syntax, then executed. In the case of LINQ toSQL, the expression tree would be converted to SQL and executed on SQL Server.However, if you use the AsQueryable operator on a query that does not implementIQueryable and only implements IEnumerable, then anytransformations you apply on the query will automatically fall back on the IEnumerablespecification. What this means is that by tagging a query with AsQueryable, youget the benefits of both LINQ to SQL and LINQ to Objects implementation. Ifyour existing query happens to implement IQueryable, the query is converted toSQL by LINQ to SQL; otherwise, the query is executed in memory in the form of ILcode.

This offers excellent benefits in real-worldscenarios where you have certain methods on an entity that returns anIQueryable of T and some methods return List. But then you have abusiness rule filter that must be applied on the entire collection, regardless ofwhether the collection is returned as IQueryable of T or IEnumerable of T. Froma performance standpoint, you really want to leverage executing the businessfilter on the database if the collection implements IQueryable; otherwise, fallback to apply the business filter in memory using the LINQ to Objectsimplementation of delegates.

Figure 11 shows the TopSellingProducts method,which returns an IQueryable of products. I defined that all the products that havebeen ordered more than 50 times are considered top products. Because theTopSellingProducts method returns an IQueryable, I have not executed the query;I simply declared the query of how to get top-selling products.

Figure 11: Using AsQueryable to apply a filtereither in memory or a database

Another method, MostProfitableProducts, returns a listof products that have a UnitPrice greater than 60. I arbitrarily came up withthis rule and stated that all products that have a UnitPrice of more than 60dollars are considered profitable to the company.

Looking further in Figure 11, the DisplayProducts methodis responsible for displaying products. One of the business rules states thatwe can only display products that have either quantity greater than 0 or theproduct is not a discontinued product. To apply our business filter, I sent theresults of both TopSellingProducts and MostProfitableProducts to our filtermethod, FilterNonDisplayableProducts. FilterNonDisplayableProducts simplyremoves the products that violate our business rules. Notice that FilterNonDisplayableProductstakes in an IEnumerable because this is the lowest common denominator withwhich we can work and is implemented by all generic collections in one form oranother. Before applying the business filter I convert the IEnumerable of Tcollection passed in as a parameter to IQueryable of T using the AsQueryableoperator. We do this because we want the filter to be applied on the databasefor collections that implement IQueryable. If the collection does not implementIQueryable, the filter is applied in memory. This way you can apply thebusiness rule to both IQueryable and IEnumerable and get the benefits of bothworlds.

Figure 12 shows the SQL capture when I execute thecode from Figure 11. Based on the SQL query sent, we can confirm that ourbusiness filter was applied on the database for top-selling products because itwas using IQueryable, whereas, for the MostProfitableProducts collection, thebusiness filter was applied in memory because it implemented only IEnumerableof T.

Figure 12: Confirms that the business filter getsexecuted on the database for top-selling products

Tip Six: Use ExecuteQuery to Execute Dynamic SQL in LINQ toSQL

If you find a particular query is difficult torepresent as a LINQ query, you can consider other options, such as writing yourown SQL statements inside a stored procedure or a function, then calling thestored procedure and function using LINQ to SQL. You also can write dynamic SQLstatements using the helper methods provided in DataContext. Some of you may bethinking this is new or different from ADO.NET, which allowed executing dynamicSQL and getting a datareader or dataset back. You can achieve the same resultsby making use of the ExecuteQuery method available on the DataContext.

The ExecuteQuery method does a lot more than simplyexecuting dynamic SQL. First, it supports the concept of parameterized queries.You provide parameters like you provide parameters to string.format usingplaceholders, which LINQ to SQL converts to a parameterized query. When youexecute dynamic SQL in traditional ADO.NET, you either get a datareader ordataset. It always had been left to the developer to convert the datareader to acustomized business object. However, ExecuteQuery takes a generic type that tellsLINQ to SQL that a particular SQL statement would return a strongly typedinstance of the class passed in as a generic type. You no longer need to worryabout converting a datareader to an object.

There are some restrictions you must follow toaccomplish proper mapping of your columns to properties on your object. First,your query columns must contain the columns that are defined as the primarykey. Second, the column names returned by the query should match the propertynames defined on your entity class. If the column names do not match, you canuse an alias in T-SQL to match the property name defined on your entity.Furthermore, LINQ to SQL does not require that you return all the columns thatmap to all the properties on your entity. If your query is missing some column,LINQ to SQL will assign a default value to your property. Beware that becauseyou are loading an entity from the database, an object tracking service willstart tracking the object you just obtained. And, if you ve partially populatedyour object, the next time you try to fetch the same object from the DataContext,you ll end up with a partially filled object unless you refresh the object fromthe database or use a fresh DataContext.

Figure 13 shows a dynamic SQL statement that returnsall the customers who placed at least one order that was shipped in the city ofSeattle. I used placeholder syntax to pass parameters to my SQL query. Theplaceholders get converted to parameterized queries, which will free thedeveloper from worrying about a SQL injection attack. I then pass my SQLstatement to the ExecuteQuery method on the DataContext, with a generic type ofCustomer to return a collection of customers. Because my query only returnsCustomerID and CompanyName, I m partially populating my customer object so anyfields that I m not populating get assigned a default value. If my query didnot return CustomerID, then LINQ to SQL would throw a runtime exceptioncomplaining that the query must at least have a CustomerID column, as it is theprimary column that LINQ to SQL uses to track the object.

Figure 13: Executing dynamic SQL to return a stronglytyped customer object

Tip Seven: Execute Your LINQ Query on the Database

Depending on the approach you take when writing aLINQ query, you may be surprised that some of the operations are gettingperformed in memory instead of the call being translated to SQL and executed onthe database. When you apply aggregate operators on association relationshipson an entity in the context of query syntax, the aggregation is performed onthe database. However, when executed outside the query syntax, the same syntaxwould force the entire association relationship to be brought from the databaseand the aggregate operation performed in memory.

In Figure 14 shows two versions of the same querythat returns the customer ID, total number of orders a customer has placed, andtotal amount they ve spent. In the first query I used the Orders associationrelationship available on each customer to get the total orders placed. To getthe total amount spent, I first used the association relationship Orders on thecustomer, then applied the SelectMany operator to get a flattened list of allthe order details for all the orders for a customer.

Figure 14: Aggregate operators are applied on thedatabase if used inside a LINQ query

After obtaining all the order details, I used thesum operator to calculate the total spent by each customer. Because I used anassociation relationship inside a LINQ query, the entire query is converted toSQL to send to the database for execution.

The second query in Figure 14 uses the sameassociation relationship on the customer entity to perform calculations.Because the calculations are not part of an existing query, LINQ to SQL mustbring all the orders for the customer and for each order bring down all itsOrderDetails to calculate the Sum and Count operations. Bringing all the ordersand order details for a customer is an expensive operation that need not beperformed if all you want to do is get the count and sum. Those operationseasily can be done on the database. As a developer, it is important tounderstand the trade-offs and know which option may be better suited for yourscenario. For instance, if you have the Orders and OrderDetails for a customerin memory, it may be more efficient to perform these operations in memoryinstead of making a database call to the server.

Conclusion

In this article we explored how LINQ uses atracking service to ensure the same object on which the user worked is beingreturned. When searching for an object based on the primary key, LINQ to SQLsaves the database roundtrip if it has initially fetched the object and isavailable in the tracking repository. If the query searches on any columnsother than the primary key, the LINQ to SQL engine will make a databaseroundtrip, but will then check the tracking service to see if the objectretrieved from a query is being tracked. If the object is available in thetracking service, it will return that object. These behaviors ensure thatconsecutive retrievals will not wipe out the changes you made on an existingobject.

We also discussed using the AssociateWith operatorto apply sorting on the database for child entities. We went through thelimitations of loading self-referencing tables using LINQ to SQL. We also sawan easier syntax available in Entity Framework that allows us to navigate thedepth of the child collection using the Include method. We then discussed howusing the AsQueryable operator lets you leverage existing IQueryable queries toperform further filtering on the database, and if the collection onlyimplements IEnumerable, query processing falls back to the IEnumerableimplementation of LINQ to Objects.

We also saw a few examples of using ExecuteQuery torun our dynamic SQL statements and get objects back. ExecuteQuery is betterthan its predecessors because its uses placeholders that get converted toparameterized queries, saving you from having to worry about SQL injectionattacks. Using ExecuteQuery you no longer have to convert readers to entities,and ExecuteQuery does not require you to fully populate your entire entity.Finally, we discussed how aggregate operators are applied on an associationrelationship of an entity. Aggregate operators used in the context of anexisting query will be applied on the database. Outside the context ofIQueryable, the aggregate operators force the entire collection to be broughton the client and calculations performed in memory.

Source code accompanying this article is available for download.

Zeeshan Hirani ([email protected])is a senior developer at CheaperThanDirt.com.He specializes mainly in ASP.NET, AJAX, and leveraging OR-mapping solutionslike LINQ and Entity Framework in business applications.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like