LINQ to SQL: Querying Data

11/23/2010

Contents

Thinking in LINQ to SQL

When you start working with LINQ to SQL, you might have to rethink the ways in which you are accustomed to writing queries, especially if you try to find the equivalent LINQ syntax for a well-known SQL statement. Moreover, a verbose LINQ query might be reduced when the corresponding SQL query is produced. You need to be aware of this change, and you have to fully understand it to be productive in LINQ to SQL. The final part of this chapter introduces you to thinking in LINQ to SQL.

The IN/EXISTS Clause

One of the best examples of the syntactic differences between T-SQL and LINQ is the NOT IN clause that you can use in SQL. LINQ does not have such a clause, which makes you wonder whether there is any way to express the same concept in LINQ. In fact, there is not always a direct translation for each single SQL keyword, but you can get the same result with semantically equivalent statements, sometimes with equal or better performance.

Consider this SQL query, which returns all the customers who do not have an order in the Orders table:

SELECT *
FROM   [dbo].[Customers] AS [t0]
WHERE  [t0].[CustomerID] NOT IN (
    SELECT [t1].[CustomerID]
    FROM   [dbo].[Orders] AS [t1]
)

This is not the fastest way to get the desired result. (Using NOT EXISTS is our favorite way—more on this shortly.) LINQ does not have an operator directly equivalent to IN or NOT IN, but it offers a Contains operator that you can use to write the code in Listing 5-27. Pay attention to the not operator (!) applied to the where predicate, which negates the Contains condition that follows.

Listing 5-27. Use of Contains to get an EXISTS/IN equivalent statement

public static void DemoContains() {
    nwDataContext db = new nwDataContext(Connections.ConnectionString);
    db.Log = Console.Out;

    var query =
        from c in db.Customers
        where !(from o in db.Orders
               select o.CustomerID)
               .Contains(c.CustomerID)
        select new { c.CustomerID, c.CompanyName };

    foreach (var c in query) {
        Console.WriteLine(c);
    }
}

The following code is the SQL query generated by LINQ to SQL:

SELECT [t0].[CustomerID], [t0].[CompanyName]
FROM   [dbo].[Customers] AS [t0]
WHERE  NOT (EXISTS(
    SELECT NULL AS [EMPTY]
    FROM   [dbo].[Orders] AS [t1]
    WHERE  [t1].[CustomerID] = [t0].[CustomerID]
    ))

Using this approach to generate SQL code is not only semantically equivalent, but it also executes faster. If you look at the input/output (I/O) operation made by SQL Server 2005, the first query (using NOT IN) executes 364 logical reads on the Orders table, whereas the second query (using NOT EXISTS) requests only 5 logical reads on the same Orders table. That is a big difference. In this case, LINQ to SQL is the best choice.

The same Contains operator might generate an IN operator in SQL, for example, if it is applied to a list of constants, as in Listing 5-28.

Listing 5-28. Use of Contains with a list of constants

public static void DemoContainsConstants() {
    nwDataContext db = new nwDataContext(Connections.ConnectionString);

    var query =
        from   c in db.Customers
        where  (new string[] { "London", "Seattle" }).Contains(c.City)
        select new { c.CustomerID, c.CompanyName, c.City };

    Console.WriteLine(query);

    foreach (var c in query) {
        Console.WriteLine(c);
    }
}

The SQL code generated by LINQ to SQL is simpler to read than the original query:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[City]
FROM   [dbo].[Customers] AS [t0]
WHERE  [t0].[City] IN ("London", "Seattle")

The LINQ query is counterintuitive in that you must specify the Contains operator on the list of constants, passing the value to look for as an argument—exactly the opposite of what you need to do in SQL:

where (new string[] { "London", "Seattle" }).Contains(c.City)

After years of experience in SQL, it is more comfortable to imagine this hypothetical IsIn syntax:

where c.City.IsIn( new string[] { "London", "Seattle" } )

However, it is probably only a question of time before you get used to the new syntax. In fact, the semantics of Contains corresponds exactly to the argument’s position. To make the code clearer, you could simply declare the list of constants outside the query declaration, in a cities array, for example:

var cities = new string[] { "London", "Seattle" };
var query =
    from   c in db.Customers
    where  cities.Contains(c.City)
    select new { c.CustomerID, c.CompanyName, c.City };

NOTE

Creating the cities array outside the query instead of putting it in the where predicate simply improves code readability, at least in LINQ to SQL. From a performance point of view, only one string array is created in both cases. The reason is that in LINQ to SQL, the query defines only an expression tree, and the array is created only once to produce the SQL statement. In LINQ to SQL, unless you execute the same query many times, performance is equivalent under either approach (object creation inside or outside a predicate). This is different in LINQ to Objects, in which the predicate condition in the where clause would be executed for each row of the data source.

SQL Query Reduction

Every LINQ to SQL query is initially represented in memory as an expression tree. The LINQ to SQL engine converts this tree into an equivalent SQL query, visiting the tree and generating the corresponding code. However, theoretically this translation can be made in many ways, all producing the same results, even if not all the translations are equally readable or perform as well. The actual implementation of LINQ to SQL generates good SQL code, favoring performance over query readability, although the readability of the generated code is often quite acceptable.

We described this quality of LINQ to SQL because it is important to know that unnecessary parts of the query are removed before the query is sent to SQL Server. You can use this knowledge to compose LINQ queries in many ways—for example, by appending new predicates and projections to an originally large selection of rows and columns, without worrying too much about unnecessary elements left in the query.

The LINQ query in Listing 5-29 first makes a query on Customers, which filters those customers with a CompanyName longer than 10 characters. Those companies are then filtered by Country, operating on the anonymous type generated by the inner query.

Listing 5-29. Example of query reduction

var query =
    from s in (
        from   c in db.Customers
        where  c.CompanyName.Length > 10
        select new { c.CustomerID, c.CompanyName, c.ContactName, c.City,
                     c.Country, c.ContactTitle, c.Address }
    )
    where s.Country == "UK"
    select new { s.CustomerID, s.CompanyName, s.City }

Despite the length of the LINQ query, here is the SQL query it generates:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[City]
FROM   [dbo].[Customers] AS [t0]
WHERE  ([t0].[Country] = @p0) AND (LEN([t0].[CompanyName]) > @p1)

The generated SQL query made two important reductions. First, the FROM operates on a single table instead of a SELECT … FROM ( SELECT … FROM …) composition that would normally be made when translating the original query tree. Second, unnecessary fields have been removed; only CustomerID, CompanyName, and City are part of the SELECT projection because they are the only fields necessary to the consumer of the LINQ query. The first reduction improves query readability; the second improves performance because it reduces the amount of data transferred from the database server to the client.

Mixing .NET Code with SQL Queries

As noted previously, LINQ to SQL has some known limitations with regard to using the full range of the .NET Framework features, not all of which can be entirely translated into corresponding T-SQL operations. This does not necessarily mean that you cannot write a query containing an unsupported method, but you should be aware that such a method cannot be translated into T-SQL and must be executed locally on the client. The side effect of this can be that sections of the query tree that depend on a .NET Framework method without a corresponding SQL translation will be executed completely as a LINQ to Objects operation, meaning that all the data must be transferred to the client to apply the required operators.

You can see this effect with some examples. Consider the LINQ query in Listing 5-30.

Listing 5-30. LINQ query with a native string manipulation in the projection

var query1 =
    from   p in db.Products
    where  p.UnitPrice > 50
    select new {
        ProductName = "** " + p.ProductName + " **",
        p.UnitPrice };

The generated SQL query embodies the string manipulation of the ProductName:

SELECT ("** " + [t0].[ProductName]) + " **" AS [ProductName],
       [t0].[UnitPrice]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[UnitPrice] > 50

Now suppose you move the string concatenation operation into a .NET Framework extension method, like that shown in Listing 5-31.

Listing 5-31. String manipulation extension method

static public class Extensions {
    public static string Highlight(this string s) {
        return "** " + s + " **";
    }
}

Then you can modify the LINQ query using the Highlight method as in Listing 5-32.

Listing 5-32. LINQ query calling a .NET Framework method in the projection

var query2 =
    from   p in db.Products
    where  p.UnitPrice > 50
    select new {
        ProductName = p.ProductName.Highlight(),
        p.UnitPrice };

The result produced by query2 in Listing 5-32 is the same as the one produced by query1 in Listing 5-30. However, the SQL query sent to the database is different because it lacks the string manipulation operation:

SELECT [t0].[ProductName] AS [s],
       [t0].[UnitPrice]
FROM   [dbo].[Products] AS [t0]
WHERE  [t0].[UnitPrice] > 50

The ProductName field is returned as s and will be used as an argument to the Highlight call. For each row, a call to the .NET Framework Highlight method will be made. This is not an issue when you are directly consuming the query2 results. However, if you turn the same operation into a subquery, the dependent queries cannot be translated into a native SQL statement. For example, consider query3 in Listing 5-33.

Listing 5-33. LINQ query combining native and custom string manipulation

var query3 =
    from a in (
        from   p in db.Products
        where  p.UnitPrice > 50
        select new {
            ProductName = p.ProductName.Highlight(),
            p.UnitsInStock,
            p.UnitPrice
        }
    )
    select new {
        ProductName = a.ProductName.ToLower(),
        a.UnitPrice };

The SQL query produced by query3 in Listing 5-33 is the same as the one produced by query2 in Listing 5-32, despite the addition of another string manipulation (ToLower) to ProductName:

SELECT [t0].[ProductName] AS [s],
       [t0].[UnitPrice]
FROM   [dbo].[Products] AS [t0]
WHERE  [t0].[UnitPrice] > 50

If you remove the call to Highlight and restore the original string manipulation directly inside the LINQ query, you will get a complete native SQL query again, as shown in Listing 5-34.

Listing 5-34. LINQ query using native string manipulation

var query4 =
    from a in (
            from   p in db.Products
            where  p.UnitPrice > 50
            select new {
                ProductName = "** " + p.ProductName + " **",
                p.UnitPrice
            }
    )
    select new {
        ProductName = a.ProductName.ToLower(),
        a.UnitPrice
    };

The query4 in Listing 5-34 produces the following SQL query, which does not require further manipulations by .NET Framework code:

SELECT LOWER([t1].[value]) AS [ProductName], [t1].[UnitPrice]
FROM (
    SELECT ("** " + [t0].[ProductName]) + " **" AS [value],
           [t0].[UnitPrice]
    FROM [dbo].[Products] AS [t0]
    ) AS [t1]
WHERE [t1].[UnitPrice] > 50

Until now, we have seen that there is a possible performance implication only when using a .NET Framework method that does not have a corresponding SQL counterpart. However, there are situations that cannot be handled by the LINQ to SQL engine and which throw an exception at execution time—for example, if you try to use the result of the Highlight call in a where predicate as shown in Listing 5-35.

Listing 5-35. LINQ query calling a .NET Framework method in a where predicate

var query5 =
    from   p in db.Products
    where  p.ProductName.Highlight().Length > 20
    select new {
        ProductName = p.ProductName.Highlight(),
        p.UnitPrice
    };

At execution time, trying to access to the query5 result (or asking for the generated SQL query) will raise the following exception:

System.NotSupportedException
Method 'System.String Highlight(System.String)'
has no supported translation to SQL.

As you have seen, it is important to understand what operators are supported by LINQ to SQL, because the code could work or break at execution time, depending on the use of such operators. It is hard to define a rule of thumb other than to avoid the use of unsupported operators. If you think that a LINQ query is composable and can be used as a source to build another query, the only safe guideline is to use operators supported by LINQ to SQL.

Save to your account