LINQ to SQL: Querying Data

11/23/2010

Contents

Data Querying

A LINQ to SQL query gets sent to the database only when the program needs to read data. For example, the following foreach loop iterates rows returned from a table:

var query =
    from    c in Customers
    where   c.Country == "USA"
    select  c.CompanyName;
    foreach( var company in query ) {
       Console.WriteLine( company );
}

The code generated by the foreach statement is equivalent to the following code. The exact moment the query is executed corresponds to the GetEnumerator call:

 // GetEnumerator sends the query to the database
IEnumerator<string> enumerator = query.GetEnumerator();
while (enumerator.MoveNext()) {
    Console.WriteLine( enumerator.Current );
}

Writing more foreach loops in the same query generates an equal number of calls to GetEnumerator, and thus an equal number of repeated executions of the same query. If you want to iterate the same data many times, you might prefer to cache data in memory. Using ToList or ToArray, you can convert the results of a query into a List or an Array, respectively. When you call these methods, the SQL query is sent to the database immediately:

// ToList() sends the query to the database
var companyNames = query.ToList();

You might want to send the query to the database several times when you manipulate the LINQ query between data iterations. For example, you might have an interactive user interface that allows the user to add a new filter condition for each iteration of data. In Listing 5-11, the DisplayTop method shows only the first few rows of the result; query manipulation between calls to DisplayTop simulates a user interaction that ends in a new filter condition each time.

Listing 5-11. Query manipulation

static void QueryManipulation() {
    DataContext db = new DataContext( ConnectionString );
    Table<Customer> Customers = db.GetTable<Customer>();
    db.Log = Console.Out;

    // All Customers
    var query =
        from    c in Customers
        select  new {c.CompanyName, c.State, c.Country };

    DisplayTop( query, 10 );

    // User interaction adds a filter
    // to the previous query
    // Customers from USA
    query =
        from   c in query
        where  c.Country == "USA"
        select c;

    DisplayTop( query, 10 );

    // User interaction adds another
    // filter to the previous query
    // Customers from WA, USA
    query =
        from   c in query
        where  c.State == "WA"
        select c;

    DisplayTop( query, 10 );
}

static void DisplayTop<T>( IQueryable<T> query, int rows ) {
    foreach( var row in query.Take(rows)) {
        Console.WriteLine( row );
    }
}

Important

The previous example used IQueryable<T> as the DisplayTop parameter. If you pass IEnumerable<T> instead, the results would appear identical, but the query sent to the database would not contain the TOP (rows) clause to filter data directly on the database. Passing IEnumerable<T> uses a different set of extension methods to resolve the Take operator, which does not generate a new expression tree. Refer to Chapter 2,“LINQ Syntax Fundamentals,” for an introduction to the differences between IEnumerable<T> and IQueryable<T>.

One common query reads a single row from a table, defining a condition that is guaranteed to be unique, such as a record key, shown in the following code:

var query =
    from    c in db.Customers
    where   c.CustomerID == "ANATR"
    select  c;

var enumerator = query.GetEnumerator();
if (enumerator.MoveNext()) {
    var customer = enumerator.Current;
    Console.WriteLine( "{0} {1}", customer.CustomerID, customer.CompanyName );
}

When you know a query will return a single row, use the Single operator to state your intention. Using this operator, you can write the previous code in a more compact way:

var customer = db.Customers.Single( c => c.CustomerID == "ANATR" );
Console.WriteLine( "{0} {1}", customer.CustomerID, customer.CompanyName );

However, it is important to note that calling Single has a different semantic than the previous equivalent query. Calling Single generates a query to the database only if the desired entity (in this case, the Customer with ANATR as CustomerID) is not already in memory. If you want to read the data from the database, you need to call the DataContext.Refresh method:

db.Refresh(RefreshMode.OverwriteCurrentValues, customer);

Projections

The transformation from an expression tree to a SQL query requires the complete understanding of the query operations sent to the LINQ to SQL engine. This transformation affects the use of object initializers. You can use projections through the select keyword, as in the following example:

var query =
   from    c in Customers
   where   c.Country == "USA"
   select  new {c.CustomerID, Name = c.CompanyName.ToUpper()} into r
   orderby r.Name
   select  r;

The whole LINQ query is translated into this SQL statement:

SELECT [t1].[CustomerID], [t1].[value] AS [Name]
FROM ( SELECT [t0].[CustomerID],
              UPPER([t0].[CompanyName]) AS [value],
              [t0].[Country]
       FROM [Customers] AS [t0]
     ) AS [t1]
WHERE    [t1].[Country] = "USA"
ORDER BY [t1].[value]

As you can see, the ToUpper method has been translated into an UPPER T-SQL function call. To do that, the LINQ to SQL engine needs a deep knowledge of the meaning of any operation in the expression tree. Consider this query:

var queryBad =
   from    c in Customers
   where   c.Country == "USA"
   select  new CustomerData( c.CustomerID, c.CompanyName.ToUpper()) into r
   orderby r.Name
   select  r;

The preceding example calls a CustomerData constructor that can do anything a piece of Intermediate Language (IL) code can do. In other words, there is no semantic value in calling a constructor other than the initial assignment of the instance created. The consequence is that LINQ to SQL cannot correctly translate this syntax into equivalent SQL code, and it throws an exception if you try to execute the query. However, you can safely use a parameterized constructor in the final projection of a query, as in the following example:

var queryParamConstructor =
    from    c in Customers
    where   c.Country == "USA"
    orderby c.CompanyName
    select  new CustomerData( c.CustomerID, c.CompanyName.ToUpper() );

If you only need to initialize an object, use object initializers instead of a parameterized constructor call, as in the following query:

var queryGood =
    from    c in Customers
    where   c.Country == "USA"
    select  new CustomerData { CustomerID = c.CustomerID,
                                Name = c.CompanyName.ToUpper() } into r
    orderby r.Name
    select  r;

Stored Procedures and User-Defined Functions

Accessing data through stored procedures and user-defined functions (UDFs) requires the definition of corresponding methods decorated with attributes. With this definition, you can write LINQ queries in a strongly typed form. From the point of view of LINQ, it makes no difference whether a stored procedure or UDF is written in T-SQL or SQLCLR, but there are some details you must know to handle differences between stored procedures and UDFs.

Stored Procedures

Consider this Customers by City stored procedure:

CREATE PROCEDURE [dbo].[Customers By City]( @param1 NVARCHAR(20) )
AS BEGIN
    SET NOCOUNT ON;
    SELECT CustomerID, ContactName, CompanyName, City
    FROM   Customers AS c
    WHERE  c.City = @param1
END

You can define a method decorated with a Function attribute that calls the stored procedure through the DataContext.ExecuteMethodCall method. Listing 5-12 defines CustomersByCity as a member of a class derived from DataContext.

Listing 5-12. Stored procedure declaration

class SampleDb : DataContext {
    // ...
    [Function(Name = "Customers by City", IsComposable = false)]
    public ISingleResult<CustomerInfo> CustomersByCity(string param1) {
        IExecuteResult executeResult =
            this.ExecuteMethodCall(
                     this,
                     (MethodInfo) (MethodInfo.GetCurrentMethod()),
                     param1);
        ISingleResult<CustomerInfo> result =
            (ISingleResult<CustomerInfo>) executeResult.ReturnValue;
        return result;
    }
}

The ExecuteMethodCall is declared in this way:

IExecuteResult ExecuteMethodCall( object instance,
                                  MethodInfo methodInfo,
                                  params object[] parameters)

The method’s first parameter is the instance, which is not required if you call a static method. The second parameter is a metadata description of the method to call, which could be obtained through Reflection, as shown in Listing 5-12. The third parameter is an array containing parameter values to pass to the method that is called.

CustomersByCity returns an instance of ISingleResult<CustomerInfo>, which implements IEnumerable<CustomerInfo> and can be enumerated in a foreach statement like this one:

SampleDb db = new SampleDb( ConnectionString );
foreach( var row in db.CustomersByCity( "London" )) {
    Console.WriteLine( "{0} {1}", row.CustomerID, row.CompanyName );
}

As you can see in Listing 5-12, you have to access the IExecuteResult interface returned by ExecuteMethodCall to get the desired result. This requires further explanation. You use the same Function attribute to decorate a method wrapping either a stored procedure or a UDF. The discrimination between these constructs is made by the IsComposable argument of the Function attribute: if it is false, the following method wraps a stored procedure; if it is true, the method wraps a user-defined function.

The IExecuteResult interface has a simple definition:

public interface IExecuteResult : IDisposable {
    object GetParameterValue(int parameterIndex);
    object ReturnValue { get; }
}

The GetParameterValue method allows access to the output parameters of a stored procedure. You need to cast this result to the correct type, also passing the ordinal position of the output parameter in parameterIndex.

The ReturnValue read-only property is used to access the return value of a stored procedure or UDF. The scalar value returned is accessible with a cast to the correct type: a stored procedure always returns an integer, whereas the type of a UDF function can be different. However, when the results are tabular, you use ISingleResult<T> to access a single result set, or IMultipleResults to access multiple result sets.

You always need to know the metadata of all possible returned result sets, applying the right types to the generic interfaces used to return data. ISingleResult<T> is a simple wrapper of IEnumerable<T> that also implements IFunctionResult, which has a ReturnValue read-only property that acts as the IExecuteResult.ReturnValue property you have already seen:

public interface IFunctionResult {
    object ReturnValue { get; }
}
public interface ISingleResult<T> :
    IEnumerable<T>, IEnumerable, IFunctionResult, IDisposable { }

You saw an example of ISingleResult<T> in Listing 5-12. We wrote the CustomersByCity wrapper in a verbose way to better illustrate the internal steps necessary to access the returning data.

Whenever you have multiple result sets from a stored procedure, you call the IMultipleResult. GetResult<T> method for each result set sequentially and specify the correct T type for the expected result. IMultipleResults also implements IFunctionResult, thereby also offering a ReturnValue read-only property:

public interface IMultipleResults : IFunctionResult, IDisposable {
      IEnumerable<TElement> GetResult<TElement>();
}

Consider the following stored procedure that returns two result sets with different structures:

CREATE PROCEDURE TwoCustomerGroups
AS BEGIN
    SELECT  CustomerID, ContactName, CompanyName, City
    FROM   Customers AS c
    WHERE  c.City = 'London'

    SELECT  CustomerID, CompanyName, City
    FROM   Customers AS c
    WHERE  c.City = 'Torino'
END

The results returned from this stored procedure can be stored in the following CustomerInfo and CustomerShortInfo types, which do not require any attributes in their declarations:

public class CustomerInfo {
    public string CustomerID;
    public string CompanyName;
    public string City;
    public string ContactName;
}

public class CustomerShortInfo {
    public string CustomerID;
    public string CompanyName;
    public string City;
}

The declaration of the LINQ counterpart of the TwoCustomerGroups stored procedure should be like the one shown in Listing 5-13.

Listing 5-13. Stored procedure with multiple results

class SampleDb : DataContext {
    // ...
    [Function(Name = "TwoCustomerGroups", IsComposable = false)]
    [ResultType(typeof(CustomerInfo))]
    [ResultType(typeof(CustomerShortInfo))]
    public IMultipleResults TwoCustomerGroups() {
        IExecuteResult executeResult =
                 this.ExecuteMethodCall(
                     this,
                     (MethodInfo) (MethodInfo.GetCurrentMethod()));
        IMultipleResults result =
            (IMultipleResults) executeResult.ReturnValue;
        return result;
    }
}

Each result set has a different type. When calling each GetResult<T>, you need to specify the correct type, which needs at least a public member with the same name for each returned column. If you specify a type with more public members than available columns, the “missing” members will have a default value. Moreover, each returned type has to be declared by using a ResultType attribute that decorates the TwoCustomerGroups method, as you can see in Listing 5-13. In the next sample, the first result set must match the CustomerInfo type, and the second result set must correspond to the CustomerShortInfo type:

IMultipleResults results = db.TwoCustomerGroups();
foreach( var row in results.GetResult<CustomerInfo>()) {
    // Access to CustomerInfo instance
}
foreach( var row in results.GetResult<CustomerShortInfo>()) {
    // Access to CustomerShortInfo instance
}

Remember that the order of ResultType attributes is not relevant, but you have to pay attention to the order of the GetResult<T> calls. The first result set will be mapped from the first GetResult<T> call, and so on, regardless of the parameter type used. For example, if you invert the previous two calls, asking for CustomerShortInfo before CustomerInfo, you get no error, but you do get an empty string for the ContactName of the second result set mapped to CustomerInfo.

Another use of IMultipleResults is the case in which a stored procedure can return different types based on parameters. For example, consider the following stored procedure:

CREATE PROCEDURE ChooseResultType( @resultType INT )
AS BEGIN
    IF @resultType = 1
        SELECT * FROM [Customers]
    ELSE IF @resultType = 2
        SELECT * FROM [Products]
END

Such a stored procedure will always return a single result, but its type might be different on each call. We do not like this use of stored procedures and prefer to avoid this situation. However, if you have to handle this case, by decorating the method with both possible ResultType attributes, you can handle both situations:

[Function(Name = "ChooseResultType", IsComposable = false)]
[ResultType(typeof(Customer))]
[ResultType(typeof(Product))]
public IMultipleResults ChooseResultType( int resultType ) {
    IExecuteResult executeResult =
            this.ExecuteMethodCall(
                 this,
                 (MethodInfo) (MethodInfo.GetCurrentMethod()),
                 resultType );
    IMultipleResults result =
        (IMultipleResults) executeResult.ReturnValue;
    return result;
}

In the single GetResult<T> call, you have to specify the type that correctly corresponds to what the stored procedure will return:

IMultipleResults results = db.ChooseResultType( 1 );
foreach( var row in results.GetResult<Customer>()) {
    // Access to Customer instance
}

If you have a similar scenario, it would be better to encapsulate the stored procedure call (ChooseResultType in this case) in several methods, one for each possible returned type. This way, you limit the risk of mismatching the relationship between parameter and result type:

public IEnumerable<Customer> ChooseCustomer() {
    IMultipleResults results = db.ChooseResultType( 1 );
    return results.GetResult<Customer>();
}

public IEnumerable<Product> ChooseProduct() {
    IMultipleResults results = db.ChooseResultType( 2 );
    return results.GetResult<Product>();
}

Before turning to user-defined functions, it is worth taking a look at what happens when you call a stored procedure in a LINQ query. Consider the following code:

var query =
    from   c in db.CustomersByCity("London")
    where  c.CompanyName.Length > 15
    select new { c.CustomerID, c.CompanyName };

Apparently, this query can be completely converted into a SQL query. However, all the data returned from CustomersByCity is passed from the SQL server to the client, as you can see from the generated SQL statement:

EXEC @RETURN_VALUE = [Customers by City] @param1 = 'London'

Both the filter (where) and projection (select) operations are made by LINQ to Objects, filtering data that has been transmitted to the client and enumerating only rows that have a CompanyName value longer than 15 characters. Thus, stored procedures are not composable into a single SQL query. To make this kind of composition, you need to use user-defined functions.

User-Defined Functions

To be used in LINQ, a user-defined function needs the same kind of declaration as a stored procedure. When you use a UDF inside a LINQ query, the LINQ to SQL engine must consider it in the construction of the SQL statement, adding a UDF call to the generated SQL. The capability of a UDF to be used in a LINQ query is what we mean by composability—the capability to compose different queries and/or operators into a single query. Because the same Function attribute is used for both stored procedures and UDFs, the IsComposable argument is set to true to map a UDF, and is set to false to map a stored procedure. Remember that there is no difference between a UDF written in T-SQL or SQLCLR.

Listing 5-14 provides an example of a LINQ declaration of the scalar-valued UDF MinUnitPriceByCategory that is defined in the sample Northwind database.

Listing 5-14. Scalar-valued UDF

class SampleDb : DataContext {
    // ...
    [Function(Name = "dbo.MinUnitPriceByCategory", IsComposable = true)]
    public decimal? MinUnitPriceByCategory( int? categoryID) {
        IExecuteResult executeResult =
            this.ExecuteMethodCall(
                this,
                ((MethodInfo) (MethodInfo.GetCurrentMethod())),
                categoryID);
        decimal? result = (decimal?) executeResult.ReturnValue;
        return result;
    }
}

The call to a UDF as an isolated expression generates a single SQL query invocation. You can also use a UDF in a LINQ query such as the following:

var query =
    from   c in Categories
    select new { c.CategoryID,
                 c.CategoryName,
                 MinPrice = db.MinUnitPriceByCategory( c.CategoryID )};

The generated SQL statement composes the LINQ query with the UDF that is called, resulting in a SQL query like this:

SELECT [t0].[CategoryID],
       [t0].[CategoryName],
       dbo.MinUnitPriceByCategory([t0].[CategoryID]) AS [value]
FROM   [Categories] AS [t0]

There are some differences in table-valued UDF wrappers. Consider the following UDF:

CREATE FUNCTION [dbo].[CustomersByCountry] ( @country NVARCHAR(15) )
RETURNS TABLE
AS RETURN
    SELECT  CustomerID,
            ContactName,
            CompanyName,
            City
    FROM    Customers c
    WHERE   c.Country = @country

To use this UDF in LINQ, you need to declare a CustomersByCountry method, as shown in Listing 5-15. A table-valued UDF always sets IsComposable to true in Function arguments, but it calls the DataContext.CreateMethodCallQuery instead of DataContext.ExecuteMethodCall.

Listing 5-15. Table-valued UDF

class SampleDb : DataContext {
    // ...
    [Function(Name = "dbo.CustomersByCountry", IsComposable = true)]
    public IQueryable<Customer> CustomersByCountry(string country) {
        return this.CreateMethodCallQuery<Customer>(
            this,
            ((MethodInfo) (MethodInfo.GetCurrentMethod())),
            country);
    }
}

A table-valued UDF can be used like any other table in a LINQ query. For example, you can join customers returned by the previous UDF with the orders they placed, as in the following query:

Table<Order> Orders = db.GetTable<Order>();
var queryCustomers =
    from   c in db.CustomersByCountry( "USA" )
    join   o in Orders
           on c.CustomerID equals o.CustomerID
           into orders
    select new { c.CustomerID, c.CompanyName, orders };

The generated SQL query will be similar to this one:

SELECT [t0].[CustomerID], [t0].[CompanyName],
       [t1].[OrderID], [t1].[CustomerID] AS [CustomerID2],
       (SELECT COUNT(*)
        FROM [Orders] AS [t2]
        WHERE [t0].[CustomerID] = [t2].[CustomerID]
        ) AS [value]
FROM dbo.CustomersByCountry('USA') AS [t0]
LEFT OUTER JOIN [Orders] AS [t1] ON [t0].[CustomerID] = [t1].[CustomerID]
ORDER BY [t1].[OrderID]

Compiled Queries

If you need to repeat the same query many times, eventually with different argument values, you might be worried about the multiple query construction. Several databases, such as SQL Server, try to parameterize received SQL queries automatically to optimize the compilation of the query execution plan. However, the program that sends a parameterized query to SQL Server will get better performance because SQL Server does not have to spend time analyzing it if the query is similar to one already processed. LINQ already does a fine job of query optimization, but each time that the same query tree is evaluated, the LINQ to SQL engine parses the query tree to build the equivalent SQL code. You can optimize this behavior by using the CompiledQuery class.

More Info

The built-in SQL Server provider sends parameterized queries to the database. Every time you see a constant value in the SQL code presented in this chapter, keep in mind that the real SQL query sent to the database has a parameter for each constant in the query. That constant can be the result of an expression that is independent of the query execution. This kind of expression is resolved by the host language (C# in this case). When you use the CompiledQuery class, it eliminates the need to parse the query tree and create the equivalent SQL code every time LINQ processes the same query. You might ask: What is the break-even point that justifies the use of the CompiledQuery class? Rico Mariani did a performance test that is described in a blog post at http://blogs.msdn.com/b/ricom/archive/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution.aspx. The response from his benchmark is that, with at least two calls for the query, the use of the CompiledQuery class produces a performance advantage.

To compile a query, you can use one of the CompiledQuery.Compile static methods. This approach passes the LINQ query as a parameter in the form of an expression tree, and then obtains a delegate with arguments corresponding to both the DataContext on which you want to operate and the parameters of the query. Listing 5-16 illustrates the compiled query declaration and use.

Listing 5-16. Compiled query in a local scope

static void CompiledQueriesLocal() {
    DataContext db = new DataContext( ConnectionString );
    Table<Customer> Customers = db.GetTable<Customer>();

    var query =
        CompiledQuery.Compile(
            ( DataContext context, string filterCountry ) =>
                from   c in Customers
                where  c.Country == filterCountry
                select new { c.CustomerID, c.CompanyName, c.City } );

    foreach (var row in query( db, "USA" )) {
        Console.WriteLine( row );
    }

    foreach (var row in query( db, "Italy" )) {
        Console.WriteLine( row );
    }
}

As you can see in Listing 5-16, the Compile method requires a lambda expression whose first argument is a DataContext instance. That argument defines the connection over which the query will be executed. In this case, we do not use that argument inside our lambda expression. Assigning the CompiledQuery.Compile result to a local variable is easy (because you declare that variable with var), but you will not encounter this situation very frequently. Chances are that you will need to store the delegate returned from CompiledQuery.Compile in an instance or a static member to easily reuse it several times. To do that, you need to know the correct declaration syntax.

A compiled query is stored in a Func delegate, where the first argument must be an instance of DataContext (or a class derived from DataContext) and the last argument must be the type returned from the query. You can define up to three arguments in the middle that will be arguments of the compiled query. You will need to specify these arguments for each compiled query invocation. Listing 5-17 shows the syntax you can use in this scenario to create the compiled query and then use it.

Listing 5-17. Compiled query assigned to a static member

public static Func< SampleDb, string, IQueryable<Customer>>
    CustomerByCountry =
        CompiledQuery.Compile(
             ( nwind.Northwind db, string filterCountry ) =>
                from   c in db.Customers
                where  c.Country == filterCountry
                select c );

static void CompiledQueriesStatic() {
    nwind.Northwind db = new nwind.Northwind( ConnectionString );

    foreach (var row in CustomerByCountry( db, "USA" )) {
        Console.WriteLine( row.CustomerID );
    }

    foreach (var row in CustomerByCountry( db, "Italy" )) {
        Console.WriteLine( row.CustomerID );
    }
}

Because the Func delegate that holds the compiled query needs the result type in its declaration, you cannot use an anonymous type as the result type of a compiled query. This is possible only when the compiled query is stored in a local variable, as you saw in Listing 5-16.

Different Approaches to Querying Data

When using LINQ to SQL entities, you have two approaches for querying the same data. The classic way to navigate a relational schema is to write associative queries, just as you can do in SQL. The alternative way offered by LINQ to SQL is through graph traversal. Given the same query result, you might obtain different SQL queries and a different level of performance using different LINQ approaches.

Consider this SQL query that calculates the total quantity of orders for a product (in this case, Chocolade, which is a localized name in the Northwind database):

SELECT    SUM( od.Quantity ) AS TotalQuantity
FROM      [Products] p
LEFT JOIN [Order Details] od
     ON   od.[ProductID] = p.[ProductID]
WHERE     p.ProductName = 'Chocolade'

The natural conversion into a LINQ query is shown in Listing 5-18. The Single operator gets the first row and puts it into quantityJoin, which is used to display the result.

Listing 5-18. Query using Join

var queryJoin =
    from   p in db.Products
    join   o in db.Order_Details
           on p.ProductID equals o.ProductID
           into OrdersProduct
    where  p.ProductName == "Chocolade"
    select OrdersProduct.Sum( o => o.Quantity );
var quantityJoin = queryJoin.Single();
Console.WriteLine( quantityJoin );

As you can see, the associative query in LINQ can explicitly require the join between Products and Order_Details through ProductID equivalency. By using entities, you can implicitly use the relationship between Products and Order_Details defined in the Product class, as shown in Listing 5-19.

Listing 5-19. Query using Association

var queryAssociation =
    from   p in db.Products
    where  p.ProductName == "Chocolade"
    select p.Order_Details.Sum( o => o.Quantity );
var quantityAssociation = queryAssociation.Single();
Console.WriteLine( quantityAssociation );

The single SQL queries produced by both of these LINQ queries are identical. The LINQ query with join is more explicit about the access to data, whereas the query that uses the association between Product and Order_Details is more implicit in this regard. Using implicit associations results in shorter queries that are less error-prone (because you cannot be wrong about the join condition). At first, you might find that a shorter query is harder to read; that might be because you are accustomed to seeing lengthier queries. Your comfort level with shorter ones might change over time.

Examining this further, you can observe that reading a single product does not require a query expression. You can apply the Single operator directly on the Products table, as shown in Listing 5-20. Although the results are the same, the internal process is much different because this kind of access generates instances of the Product and Order_Details entities in memory, even if you do not use them in your program.

Listing 5-20. Access through Entity

var chocolade = db.Products.Single( p => p.ProductName == "Chocolade" );
var quantityValue = chocolade.Order_Details.Sum( o => o.Quantity );
Console.WriteLine( quantityValue );

This is a two-step operation that sends two SQL queries to the database. The first one retrieves the Product entity. The second one accesses the Order Details table to get all the Order Details rows for the required product and sums up the Quantity value in memory for the required product. The operation generates the following SQL statements:

SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID],
       [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice],
       [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel],
       [t0].[Discontinued]
FROM   [dbo].[Products] AS [t0]
WHERE  [t0].[ProductName] = "Chocolade"

SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity],
       [t0].[Discount]
FROM   [dbo].[Order Details] AS [t0]
WHERE  [t0].[ProductID] = "Chocolade"

Code that uses this kind of access is shorter to write compared to a query, but its performance is worse if you need to get only the total Quantity value, without needing to retrieve Product and Order_Detail entities in memory for further operations.

The queries in Listing 5-18 and 5-19 did not create Product or Order_Details instances because the output required only the product total. From this point of view, if you already had the required Product and Order_Details instances for Chocolade in memory, the performance of those queries would be worse because they unnecessarily access the database to get data that is already in memory. On the other hand, a second access to get the sum Quantity could be faster if you use the entity approach. Consider this code:

var chocolade = db.Products.Single( p => p.ProductName == "Chocolade" );
var quantityValue = chocolade.Order_Details.Sum( o => o.Quantity );
Console.WriteLine( quantityValue );
var repeatCalc = chocolade.Order_Details.Sum( o => o.Quantity );
Console.WriteLine( repeatCalc );

The quantityValue evaluation requires a database query to create Order_Details entities, whereas the repeatCalc evaluation is made on the in-memory entities without the need to read other data from SQL Server.

A final thought on the number of generated queries: You might think that we generated two queries when accessing data through the Product entity because we had two distinct statements—one to assign the chocolade variable, and the other to assign a value to quantity-Entity. This assumption is not completely true. Even if you write a single statement, the use of a Product entity (the results from the Single operator call) generates a separate query. Listing 5-21 produces the same results (in terms of memory objects and SQL queries) as Listing 5-20.

Listing 5-21. Access through Entity with a single statement

var quantityChocolade = db.Products.Single( p => p.ProductName == "Chang" )
                        .Order_Details.Sum( o => o.Quantity );
Console.WriteLine( quantityChocolade );

Finding a better way to access data really depends on the entire set of operations performed by a program. If you extensively use entities in your code to store data in memory, access to data through graph traversal based on entity access might offer better performance. On the other hand, if you always transform query results in anonymous types and never manipulate entities in memory, you might prefer an approach based on LINQ queries. As usual, the right answer is, “It depends.”

Direct Queries

Sometimes you might need access to database SQL features that are not available with LINQ. For example, imagine that you want to use Common Table Expressions (CTEs) or the PIVOT command with SQL Server. LINQ does not have an explicit constructor to do that, even if its SQL Server provider could use these features to optimize some queries. In such cases, you can use the ExecuteQuery<T> method of the DataContext class to send a query directly to the database. Listing 5-22 shows an example. (The T in ExecuteQuery<T> is an entity class that represents a returned row.)

Listing 5-22. Direct query

var query = db.ExecuteQuery<EmployeeInfo>( @"
    WITH EmployeeHierarchy (EmployeeID, LastName, FirstName,
                             ReportsTo, HierarchyLevel) AS
     ( SELECT EmployeeID,LastName, FirstName,
              ReportsTo, 1 as HierarchyLevel
       FROM   Employees
       WHERE  ReportsTo IS NULL

       UNION ALL

       SELECT      e.EmployeeID, e.LastName, e.FirstName,
                   e.ReportsTo, eh.HierarchyLevel + 1 AS HierarchyLevel
       FROM        Employees e
       INNER JOIN  EmployeeHierarchy eh
               ON  e.ReportsTo = eh.EmployeeID
    )
    SELECT   *
    FROM     EmployeeHierarchy
    ORDER BY HierarchyLevel, LastName, FirstName" );

As you can see, you need a type to get direct query results. We used the EmployeeInfo type in this example, which is declared as follows:

public class EmployeeInfo {
    public int EmployeeID;
    public string LastName;
    public string FirstName;
    public int? ReportsTo; // int? Corresponds to Nullable<int>
    public int HierarchyLevel;
}

The names and types of EmployeeInfo members must match the names and types of the columns returned by the executed query. Please note that if a column can return a NULL value, you need to use a nullable type, as we did for the ReportsTo member declared as int? above (which corresponds to Nullable<int>).

The ExecuteQuery method can receive parameters using the same parameter placeholders notation (also known as curly notation) used by Console.WriteLine and String.Format, but with a different behavior. Parameters are not replaced in the string sent to the database; they are substituted with automatically generated parameter names such as (@p0, @p1, @p2, …) and are sent to SQL Server as arguments of the parametric query.

The code in Listing 5-23 shows the call to ExecuteQuery<T> using a SQL statement with two parameters. The parameters are used to filter the customers who made their first order within a specified range of dates.

Listing 5-23. Direct query with parameters

var query = db.ExecuteQuery<CompanyOrders>(@"
        SELECT    c.CompanyName,
                  MIN( o.OrderDate ) AS FirstOrderDate,
                  MAX( o.OrderDate ) AS LastOrderDate
        FROM      Customers c
        LEFT JOIN Orders o
               ON o.CustomerID = c.CustomerID
        GROUP BY  c.CustomerID, c.CompanyName
        HAVING    COUNT(o.OrderDate) > 0
           AND    MIN( o.OrderDate ) BETWEEN {0} AND {1}
        ORDER BY  FirstOrderDate ASC",
    new DateTime( 1997, 1, 1 ),
    new DateTime( 1997, 12, 31 ) );

The parameters in the preceding query are identified by the {0} and {1} format items. The generated SQL query simply substitutes them with @p0 and @p1. The results are returned in instances of the CompanyOrders class, declared as follows:

public class CompanyOrders {
    public string CompanyName;
    public DateTime FirstOrderDate;
    public DateTime LastOrderDate;
}

Deferred Loading of Entities

You have seen that using graph traversal to query data is a very comfortable way to proceed. However, sometimes you might want to stop the LINQ to SQL provider from automatically deciding what entities have to be read from the database and when, thereby taking control over that part of the process. You can do this by using the DeferredLoadingEnabled and LoadOptions properties of the DataContext class.

The code in Listing 5-24 makes the same QueryOrder call under three different conditions, driven by the code in the DemoDeferredLoading method.

Listing 5-24. Deferred loading of entities

public static void DemoDeferredLoading() {
    Console.Write("DeferredLoadingEnabled=true  ");
    DemoDeferredLoading(true);
    Console.Write("DeferredLoadingEnabled=false ");
    DemoDeferredLoading(false);
    Console.Write("Using LoadOptions            ");
    DemoLoadWith();
}

static void DemoDeferredLoading(bool deferredLoadingEnabled) {
    nwDataContext db = new nwDataContext(Connections.ConnectionString);
    db.DeferredLoadingEnabled = deferredLoadingEnabled;

    QueryOrder(db);
}

static void DemoLoadWith() {
    nwDataContext db = new nwDataContext(Connections.ConnectionString);
    db.DeferredLoadingEnabled = false;

    DataLoadOptions loadOptions = new DataLoadOptions();
    loadOptions.LoadWith<Order>(o => o.Order_Details);
    db.LoadOptions = loadOptions;

    QueryOrder(db);
}

static void QueryOrder(nwDataContext db) {
    var order = db.Orders.Single((o) => o.OrderID == 10251);
    var orderValue = order.Order_Details.Sum(od => od.Quantity * od.UnitPrice);
    Console.WriteLine(orderValue);
}

The call to DemoDeferredLoading(true) sets the DeferredLoadingEnabled property to true, which is the default condition for a DataContext instance. The call to DemoDeferredLoading(false) disables the DeferredLoadingEnabled property. Any access to the related entities does not automatically load data from the database, and the sum of Order_Details entities shows a total of 0. Finally, the call to DemoLoadWith also disables DeferredLoadingEnabled, but it sets the LoadOptions property of the DataContext, requesting the loading of Order_Details entities related to an Order instance. The execution of the DemoDeferredLoading method in Listing 5-24 produces the following output:

DeferredLoadingEnabled=true  670,8000
DeferredLoadingEnabled=false 0
Using LoadOptions            670,8000

Remember that the use of LoadOptions is possible regardless of the state of DeferredLoadingEnabled, and it is useful for improving performance when early loading of related entities (rather than deferred loading) is an advantage for your application. Consider carefully before using DeferredLoadingEnabled—it does not produce any error, but it limits the navigability of your data model through graph traversal. However, you must remember that DeferredLoadingEnabled is automatically considered to be false whenever the ObjectTrackingEnabled property (discussed in the next section) is disabled too.

Deferred Loading of Properties

LINQ to SQL provides a deferred loading mechanism that acts at the property level, loading data only when that property is accessed for the first time. You can use this mechanism when you need to load a large number of entities in memory, which usually requires space to accommodate all the properties of the class that correspond to table columns of the database. If a certain field is very large and is not always accessed for every entity, you can delay the loading of that property.

To request the deferred loading of a property, you simply use the Link<T> type to declare the storage variable for the table column, as you can see in Listing 5-25.

Listing 5-25. Deferred loading of properties

[Table(Name = "Customers")]
public class DelayCustomer {
    private Link<string> _Address;

    [Column(IsPrimaryKey = true)] public string CustomerID;
    [Column] public string CompanyName;
    [Column] public string Country;

    [Column(Storage = "_Address")]
    public string Address {
        get { return _Address.Value; }
        set { _Address.Value = value; }
    }
}

public static class DeferredLoading {
    public static void DelayLoadProperty() {
        DataContext db = new DataContext(Connections.ConnectionString);
        Table<DelayCustomer> Customers = db.GetTable<DelayCustomer>();
        db.Log = Console.Out;

        var query =
            from   c in Customers
            where  c.Country == "Italy"
            select c;

        foreach (var row in query) {
            Console.WriteLine(
                "{0} - {1}",
                row.CompanyName,
                row.Address);
        }
    }
}

The query that is sent to the database to get the list of Italian customers is functionally equivalent to the following one:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[Country]
FROM   [Customers] AS [t0]
WHERE  [t0].[Country] = "Italy"

This query does not retrieve the Address field. When the result of the query is iterated in the foreach loop, the Address property of the current Customer is accessed for each customer for the first time. This produces a query to the database like the following one to get the Address value:

SELECT [t0].[Address]
FROM   [Customers] AS [t0]
WHERE  [t0].[CustomerID] = @p0

You should use the Link<T> type only when the content of a field is very large (which should not be the case for the Address field example) or when that field is rarely accessed. A field defined with the SQL type VARCHAR(MAX) is generally a good candidate, as long as its value is displayed only in a detailed form visible on demand and not on the main grid that shows query results. Using the LINQ to SQL class generator included in Visual Studio, you can use Link<T> and set the Delay Loaded property of the desired member property to true.

Read-Only DataContext Access

If you need to access data exclusively as read-only, you might want to improve performance by disabling a DataContext service that supports data modification:

DataContext db = new DataContext( ConnectionString );
db.ObjectTrackingEnabled = false;
var query = ...

The ObjectTrackingEnabled property controls the change tracking service described in Chapter 6. By default, ObjectTrackingEnabled is set to true.

Limitations of LINQ to SQL

LINQ to SQL has some limitations when converting a LINQ query to a corresponding SQL statement. For this reason, some valid LINQ to Objects statements are not supported in LINQ to SQL. In this section, we cover the most important operators that you cannot use in a LINQ to SQL query. However, you can use specific T-SQL commands by using the extension methods defined in the SqlMethods class, which you will find in the System.Data.Linq.SqlClient namespace.

Aggregate Operators

The general-purpose Aggregate operator is not supported. However, specialized aggregate operators such as Count, LongCount, Sum, Min, Max, and Average are fully supported.

Any aggregate operator other than Count and LongCount requires particular care to avoid an exception if the result is null. If the entity class has a member of a nonnullable type and you make an aggregation on it, a null result (for example when no rows are aggregated) throws an exception. To avoid the exception, you should cast the aggregated value to a nullable type before considering it in the aggregation function. Listing 5-26 shows an example of the necessary cast.

Listing 5-26. Null handling with aggregate operators

decimal? totalFreight =
    (from   o in Orders
     where  o.CustomerID == "NOTEXIST"
     select o).Min( o => (decimal?) o.Freight );

This cast is necessary only if you declared the Freight property with decimal, as shown in the following code:

[Table(Name = "Orders")]
public class Order {
    [Column] public decimal Freight;
}

Another solution is to declare Freight as a nullable type, using decimal?—but it is not a good idea to have different nullable settings between entities and corresponding tables in the database.

Partitioning Operators

The TakeWhile and SkipWhile operators are not supported. Take and Skip operators are supported, but be careful with Skip because the generated SQL query could be complex and not very efficient when skipping a large number of rows, especially when the target database is SQL Server 2000.

Element Operators

The following operators are not supported: ElementAt, ElementAtOrDefault, Last, and LastOrDefault.

String Methods

Many of the .NET Framework String type methods are supported in LINQ to SQL because T-SQL has a corresponding method. However, there is no support for methods that are culture-aware (those that receive arguments of type CultureInfo, StringComparison, and IFormatProvider) and for methods that receive or return a char array.

DateTime Methods

The DateTime type in the .NET Framework is different than the DATETIME and SMALLDATETIME types in SQL Server. The range of values and the precision is greater in the .NET Framework than in SQL Server, meaning the .NET Framework can correctly represent SQL Server types, but not the opposite. Check out the SqlMethods extension methods, which can take advantage of several DateDiff functions.

LIKE Operator

Although the LIKE T-SQL operator is used whenever a StartsWith, EndsWith, or Contains operator is called on a string property, you can use LIKE directly by calling the SqlMethods.Like method in a predicate.