SQL Windowing

  • 4/15/2012

Elements of Window Functions

The specification of a window function’s behavior appears in the function’s OVER clause and involves multiple elements. The three core elements are partitioning, ordering, and framing. Not all window functions support all elements. As I describe each element, I’ll also indicate which functions support it.

Partitioning

The partitioning element is implemented with a PARTITION BY clause and is supported by all window functions. It restricts the window of the current calculation to only those rows from the result set of the query that have the same values in the partitioning columns as in the current row. For example, if your function uses PARTITION BY custid and the custid value in the current row is 1, the window with respect to the current row is all rows from the result set of the query that have a custid value of 1. If the custid value of the current row is 2, the window with respect to the current row is all rows with a custid of 2.

If a PARTITION BY clause is not specified, the window is not restricted. Another way to look at it is that inf case explicit partitioning wasn’t specified, the default partitioning is to consider the entire result set of the query as one partition.

If it wasn’t obvious, let me point out that different functions in the same query can have different partitioning specifications. Consider the query in Example 1-1 as an example.

Example 1-1. Query with Two RANK Calculations

SELECT custid, orderid, val,
  RANK() OVER(ORDER BY val DESC) AS rnk_all,
  RANK() OVER(PARTITION BY custid
              ORDER BY val DESC) AS rnk_cust
FROM Sales.OrderValues;

Observe that the first RANK function (which generates the attribute rnk_all) relies on the default partitioning, and the second RANK function (which generates rnk_cust) uses explicit partitioning by custid. Figure 1-4 illustrates the partitions defined for a sample of three results of calculations in the query: one rnk_all value and two rnk_cust values.

Figure 1-4

Figure 1-4. Window partitioning.

The arrows point from the result values of the functions to the window partitions that were used to compute them.

Ordering

The ordering element defines the ordering for the calculation, if relevant, within the partition. In standard SQL, all functions support an ordering element. As for SQL Server, initially it didn’t support the ordering element with aggregate functions; rather, it only supported partitioning. Support for ordering for aggregates was added in SQL Server 2012.

Interestingly, the ordering element has a slightly different meaning for different function categories. With ranking functions, ordering is intuitive. For example, when using descending ordering, the RANK function returns one more than the number of rows in your respective partition that have a greater ordering value than yours. When using ascending ordering, the function returns one more than the number of rows in the pattern with a lower ordering value than yours. Figure 1-5 illustrates the rank calculations from Example 1-1 shown earlier—this time including the interpretation of the ordering element.

Figure 1-5

Figure 1-5. Window ordering.

Figure 1-5 depicts the windows of only three of the rank calculations. Of course, there are many more—1,660, to be precise. That’s because there are 830 rows involved, and for each row, two rank calculations are made. What’s interesting to note here is that conceptually it’s as if all those windows coexist simultaneously.

Aggregate window functions have a slightly different meaning for ordering compared to ranking window functions. With aggregates, contrary to what some might think, ordering has nothing to do with the order in which the aggregate is applied; rather, the ordering element gives meaning to the framing options that I will describe next. In other words, the ordering element is an aid to define which rows to restrict in the window.

Framing

Framing is essentially another filter that further restricts the rows in the partition. It is applicable to aggregate window functions as well as to three of the offset functions: FIRST_VALUE, LAST_VALUE, and NTH_VALUE. Think of this windowing element as defining two points in the current row’s partition based on the given ordering, framing the rows that the calculation will apply to.

The framing specification in the standard includes a ROWS or RANGE option that defines the starting row and ending row of the frame, as well as a window frame-exclusion option. SQL Server 2012 introduced support for framing, with full implementation of the ROWS option, partial implementation of the RANGE option, and no implementation of the window frame-exclusion option.

The ROWS option allows you to indicate the points in the frame as an offset in terms of the number of rows with respect to the current row. The RANGE option is more dynamic, defining the offsets in terms of a difference between the value of the frame point and the current row’s value. The window frame-exclusion option specifies what to do with the current row and its peers in case of ties. This explanation might seem far from clear or sufficient, but I don’t want to get into the details just yet. There will be plenty of that later. For now, I just want to introduce the concept and provide a simple example. Following is a query against the EmpOrders view, calculating the running total quantity for each employee and order month:

SELECT empid, ordermonth, qty,
  SUM(qty) OVER(PARTITION BY empid
                ORDER BY ordermonth
                ROWS BETWEEN UNBOUNDED PRECEDING
                         AND CURRENT ROW) AS runqty
FROM Sales.EmpOrders;

Observe that the window function applies the SUM aggregate to the qty attribute, partitions the window by empid, orders the partition rows by ordermonth, and frames the partition rows based on the given ordering between unbounded preceding (no low boundary point) and the current row. In other words, the result will be the sum of all prior rows in the frame, inclusive of the current row. This query generates the following output, shown here in abbreviated form:

empid  ordermonth              qty         run_qty
------ ----------------------- ----------- -----------
1      2006-07-01 00:00:00.000 121         121
1      2006-08-01 00:00:00.000 247         368
1      2006-09-01 00:00:00.000 255         623
1      2006-10-01 00:00:00.000 143         766
1      2006-11-01 00:00:00.000 318         1084
...
2      2006-07-01 00:00:00.000 50          50
2      2006-08-01 00:00:00.000 94          144
2      2006-09-01 00:00:00.000 137         281
2      2006-10-01 00:00:00.000 248         529
2      2006-11-01 00:00:00.000 237         766
...

Observe how the window specification is as easy to read as plain English. I will provide much more detail about the framing options in Chapter 2.