Table Expressions in Microsoft SQL Server 2008 T-SQL

  • 10/22/2008

The APPLY Operator

The APPLY operator is a nonstandard table operator that was introduced in SQL Server 2005. This operator is used in the FROM clause of a query like all table operators. The two supported types of the APPLY operator are CROSS APPLY and OUTER APPLY. CROSS APPLY implements only one logical query processing phase, while OUTER APPLY implements two.

The APPLY operator operates on two input tables, the second of which may be a table expression; I’ll refer to them as the left and right tables. The right table is usually a derived table or an inline TVF. The CROSS APPLY operator implements one logical query processing phase—it applies the right table expression to each row from the left table, and produces a result table with the unified result sets.

So far it might sound like the CROSS APPLY operator is very similar to a cross join, and in a sense that’s true. For example, the following two queries return the same result sets:

SELECT S.shipperid, E.empid
FROM Sales.Shippers AS S
  CROSS JOIN HR.Employees AS E;

SELECT S.shipperid, E.empid
FROM Sales.Shippers AS S
  CROSS APPLY HR.Employees AS E;

However, with the CROSS APPLY operator the right table expression can represent a different set of rows per each row from the left table, unlike in a join. You can achieve this when you use a derived table in the right side, and in the derived table query refer to attributes from the left side. Or when you use an inline TVF, you can pass attributes from the left side as input arguments.

For example, the following code uses the CROSS APPLY operator to return the three most recent orders for each customer:

SELECT C.custid, A.orderid, A.orderdate
FROM Sales.Customers AS C
  CROSS APPLY
    (SELECT TOP(3) orderid, empid, orderdate, requireddate
     FROM Sales.Orders AS O
     WHERE O.custid = C.custid
     ORDER BY orderdate DESC, orderid DESC) AS A;

You can think of the table expression A as a correlated table subquery. In terms of logical query processing, the right table expression (derived table in our case) is applied to each row from the Customers table. Notice the reference to the attribute C.custid from the left table in the derived table’s query filter. The derived table returns the three most recent orders for the customer from the current left row. Because the derived table is applied to each row from the left side, the CROSS APPLY operator returns the three most recent orders for each customer.

Here’s the output of this query, shown here in abbreviated form:

custid      orderid     orderdate
----------- ----------- -----------------------
1           11011       2008-04-09 00:00:00.000
1           10952       2008-03-16 00:00:00.000
1           10835       2008-01-15 00:00:00.000
2           10926       2008-03-04 00:00:00.000
2           10759       2007-11-28 00:00:00.000
2           10625       2007-08-08 00:00:00.000
3           10856       2008-01-28 00:00:00.000
3           10682       2007-09-25 00:00:00.000
3           10677       2007-09-22 00:00:00.000
...

(263 row(s) affected)

If the right table expression returns an empty set, the CROSS APPLY operator does not return the corresponding left row. For example, customers 22 and 57 did not place orders. In both cases the derived table is an empty set; therefore, those customers are not returned in the output. If you want to return rows from the left table for which the right table expression returns an empty set, use the OUTER APPLY operator instead of CROSS APPLY. The OUTER APPLY operator adds a second logical phase that identifies rows from the left side for which the right table expression returns an empty set, and adds those rows to the result table as outer rows with NULLs in the right side’s attributes as place holders. In a sense, this phase is similar to the phase that adds outer rows in a left outer join.

For example, run the following code to return the three most recent orders for each customer, and include in the output customers with no orders as well:

SELECT C.custid, A.orderid, A.orderdate
FROM Sales.Customers AS C
  OUTER APPLY
    (SELECT TOP(3) orderid, empid, orderdate, requireddate
     FROM Sales.Orders AS O
     WHERE O.custid = C.custid
     ORDER BY orderdate DESC, orderid DESC) AS A;

This time, customers 22 and 57, who did not place orders, are included in the output, which is shown here in abbreviated form:

custid      orderid     orderdate
----------- ----------- -----------------------
1           11011       2008-04-09 00:00:00.000
1           10952       2008-03-16 00:00:00.000
1           10835       2008-01-15 00:00:00.000
2           10926       2008-03-04 00:00:00.000
2           10759       2007-11-28 00:00:00.000
2           10625       2007-08-08 00:00:00.000
3           10856       2008-01-28 00:00:00.000
3           10682       2007-09-25 00:00:00.000
3           10677       2007-09-22 00:00:00.000
...
22          NULL        NULL
...
57          NULL        NULL

(265 row(s) affected)

For encapsulation purposes you may find it more convenient to work with inline TVFs instead of derived tables. This way your code will be simpler to follow and maintain. For example, the following code creates an inline TVF called fn_TopOrders that accepts as inputs a customer ID (@custid) and a number (@n), and returns the @n most recent orders for customer @custid:

IF OBJECT_ID('dbo.fn_TopOrders') IS NOT NULL
  DROP FUNCTION dbo.fn_TopOrders;
GO
CREATE FUNCTION dbo.fn_TopOrders
  (@custid AS INT, @n AS INT)
  RETURNS TABLE
AS
RETURN
  SELECT TOP(@n) orderid, empid, orderdate, requireddate
  FROM Sales.Orders
  WHERE custid = @custid
  ORDER BY orderdate DESC, orderid DESC;
GO

You can now substitute the use of the derived table from the previous examples with the new function:

SELECT
  C.custid, C.companyname,
  A.orderid, A.empid, A.orderdate, A.requireddate
FROM Sales.Customers AS C
  CROSS APPLY dbo.fn_TopOrders(C.custid, 3) AS A;

The code is much more readable and easier to maintain. In terms of physical processing, nothing really changed because, as I stated earlier, the definition of table expressions is expanded, and SQL Server will in any case end up querying the underlying objects directly.