SQL Windowing

Elements of Window Functions

The specification of a window function appears in the function’s OVER clause and involves multiple elements. The three core elements are window partitioning, ordering, and framing. Not all window functions support all elements. As I describe each element, I’ll also indicate which functions support it.

Window Partitioning

The optional window partitioning element is implemented with a PARTITION BY clause and is supported by all window functions. It restricts the window of the current calculation to only those rows from the result set of the query that have the same values in the partitioning columns as in the current row. For example, if your function uses PARTITION BY custid and the custid value in the current row is 1, the window partition with respect to the current row is all rows from the result set of the query that have a custid value of 1. If the custid value of the current row is 2, the window partition with respect to the current row is all rows with a custid of 2.

If a PARTITION BY clause is not specified, the window is not restricted. Another way to look at it is that in case explicit partitioning wasn’t specified, the default partitioning is to consider the entire result set of the query as one partition.

If it wasn’t obvious, let me point out that different functions in the same query can have different partitioning specifications. Consider the query in Listing 1-5 as an example.

Listing 1-5 Query with Two RANK Calculations

SELECT custid, orderid, val,
  RANK() OVER(ORDER BY val DESC) AS rnkall,
  RANK() OVER(PARTITION BY custid
              ORDER BY val DESC) AS rnkcust
FROM Sales.OrderValues;

Observe that the first RANK function (which generates the attribute rnkall) relies on the implied partitioning, and the second RANK function (which generates rnkcust) uses explicit partitioning by custid. Figure 1-5 illustrates the partitions defined for a sample of three results of calculations in the query: one rnkall value and two rnkcust values.

FIGURE 1-5

FIGURE 1-5 Window partitioning.

The arrows point from the result values of the functions to the window partitions that were used to compute them.

Window Ordering

The window ordering clause defines the ordering for the calculation, if relevant, within the partition. Interestingly, this clause has a slightly different meaning for different function categories. With ranking functions, ordering is intuitive. For example, when using descending ordering, the RANK function returns one more than the number of rows in your respective partition that have a greater ordering value than yours. When using ascending ordering, the function returns one more than the number of rows in the partition with a lower ordering value than yours. Figure 1-6 illustrates the rank calculations from Listing 1-5 shown earlier—this time including the interpretation of the ordering element.

FIGURE 1-6

FIGURE 1-6 Window ordering.

Figure 1-6 depicts the windows of only three of the rank calculations. Of course, there are many more—1,660, to be precise. That’s because there are 830 rows involved, and for each row, two rank calculations are made. What’s interesting to note here is that conceptually it’s as if all those windows coexist simultaneously.

Aggregate window functions have a slightly different meaning for window ordering, when specified, compared to ranking window functions. With aggregates, contrary to what some might think, ordering has nothing to do with the order in which the aggregate is applied; rather, the ordering element gives meaning to the framing options that I will describe next. In other words, the ordering element is an aid as part of defining which rows to restrict in the window.

Window Framing

Earlier, I referred to window partitioning as serving a filtering role. Window framing is essentially another filter that further restricts the rows in the window partition. It is applicable to aggregate window functions as well as to three of the offset functions: FIRST_VALUE, LAST_VALUE, and NTH_VALUE. As a reminder, the last is unsupported in SQL Server. Think of this windowing element as defining two end points, or delimiters, in the current row’s partition based on the given ordering, framing the rows that the calculation will apply to.

The framing specification in the standard includes a ROWS, GROUPS or RANGE option that defines the starting row and ending row of the frame, as well as a window frame-exclusion option. SQL Server supports framing, with full implementation of the ROWS option, partial implementation of the RANGE option, and no implementation of the GROUPS and window frame-exclusion options.

The ROWS option allows you to indicate the points in the frame as an offset in terms of the number of rows with respect to the current row, based on the window ordering. The GROUPS option is similar to ROWS, but you specify an offset in terms of the number of distinct groups with respect to the current group, based on the window ordering. The RANGE option is more dynamic, defining the offsets in terms of a difference between the ordering value of the frame point and the current row’s ordering value. The window frame-exclusion option specifies what to do with the current row and its peers in case of ties. This explanation might seem far from clear or sufficient, but I don’t want to get into the details just yet. There will be plenty of that later. For now, I just want to introduce the concept and provide a simple example. Following is a query against the EmpOrders view, calculating the running total quantity for each employee and order month:

SELECT empid, ordermonth, qty,
  SUM(qty) OVER(PARTITION BY empid
                ORDER BY ordermonth
                ROWS BETWEEN UNBOUNDED PRECEDING
                         AND CURRENT ROW) AS runqty
FROM Sales.EmpOrders;

Observe that the window function applies the SUM aggregate function to the qty attribute, partitions the window by empid, orders the partition rows by ordermonth, and frames the partition rows based on the given ordering between unbounded preceding (no low boundary point) and the current row. In other words, the result will be the sum of all prior rows in the frame, inclusive of the current row. This query generates the following output, shown here in abbreviated form:

empid  ordermonth qty  runqty
------ ---------- ---- -------
1      2017-07-01 121  121
1      2017-08-01 247  368
1      2017-09-01 255  623
1      2017-10-01 143  766
1      2017-11-01 318  1084
...
2      2017-07-01 50   50
2      2017-08-01 94   144
2      2017-09-01 137  281
2      2017-10-01 248  529
2      2017-11-01 237  766
...

Observe how the window specification is as easy to read as plain English. I will provide much more detail about the framing options in Chapter 2.