Aggregating and Pivoting Data in Microsoft SQL Server 2008 T-SQL

3/25/2009

Contents

Unpivoting

Unpivoting is the opposite of pivoting—namely, rotating columns to rows. Unpivoting is usually used to normalize data, but it has other applications as well.

In my examples, I’ll use the PvtCustOrders table, which you create and populate by running the following code:

USE tempdb;

IF OBJECT_ID('dbo.PvtCustOrders') IS NOT NULL
  DROP TABLE dbo.PvtCustOrders;
GO
SELECT custid,
  COALESCE([2006], 0) AS [2006],
  COALESCE([2007], 0) AS [2007],
  COALESCE([2008], 0) AS [2008]
INTO dbo.PvtCustOrders
FROM (SELECT custid, YEAR(orderdate) AS orderyear, qty
      FROM dbo.Orders) AS D
  PIVOT(SUM(qty) FOR orderyear IN([2006],[2007],[2008])) AS P;

UPDATE dbo.PvtCustOrders
  SET [2007] = NULL, [2008] = NULL
WHERE custid = 'D';

-- Show the contents of the table
SELECT * FROM dbo.PvtCustOrders;

This generates the following output:

custid 2006        2007        2008
------ ----------- ----------- -----------
A      22          40          10
B      20          12          15
C      22          14          20
D      30          NULL        NULL

The goal in this case is to generate a result row for each customer and year, containing the customer ID (custid), order year (orderyear), and quantity (qty).

I’ll start with a solution that does not use the native UNPIVOT operator. Here as well, try to think in terms of logical query processing as described in Chapter 1.

The first step in the solution is to generate three copies of each base row—one for each year. You can achieve this by performing a cross join between the base table and a virtual auxiliary table that has one row per year. The SELECT list can then return custid and orderyear and also calculate the target year’s qty with the following CASE expression:

CASE orderyear
  WHEN 2006 THEN [2006]
  WHEN 2007 THEN [2007]
  WHEN 2008 THEN [2008]
END AS qty

You achieve unpivoting this way, but you also get rows corresponding to NULL values in the source table (for example, for customer D in years 2007 and 2008). To eliminate those rows, create a derived table out of the solution query and, in the outer query, eliminate the rows with the NULL in the qty column.

Here’s the complete solution, followed by its output:

SELECT custid, orderyear, qty
FROM (SELECT custid, orderyear,
        CASE orderyear
          WHEN 2006 THEN [2006]
          WHEN 2007 THEN [2007]
          WHEN 2008 THEN [2008]
        END AS qty
      FROM dbo.PvtCustOrders
        CROSS JOIN
          (SELECT 2006 AS orderyear
           UNION ALL SELECT 2007
           UNION ALL SELECT 2008) AS OrderYears) AS D
WHERE qty IS NOT NULL;

custid orderyear   qty
------ ----------- -----------
A      2006        22
A      2007        40
A      2008        10
B      2006        20
B      2007        12
B      2008        15
C      2006        22
C      2007        14
C      2008        20
D      2006        30
D      2007        0
D      2008        0

As of SQL Server 2008, you can replace the current definition of the derived table D with a table value constructor based on the VALUES clause, like so:

SELECT custid, orderyear, qty
FROM (SELECT custid, orderyear,
        CASE orderyear
          WHEN 2006 THEN [2006]
          WHEN 2007 THEN [2007]
          WHEN 2008 THEN [2008]
        END AS qty
      FROM dbo.PvtCustOrders
        CROSS JOIN
          (VALUES(2006),(2007),(2008)) AS OrderYears(orderyear)) AS D
WHERE qty IS NOT NULL;

Either way, using the native proprietary UNPIVOT table operator is dramatically simpler, as the following query shows:

SELECT custid, orderyear, qty
FROM dbo.PvtCustOrders
  UNPIVOT(qty FOR orderyear IN([2006],[2007],[2008])) AS U;

Unlike the PIVOT operator, I find the UNPIVOT operator simple and intuitive, and obviously it requires significantly less code than the alternative solutions. UNPIVOT’s first input is the target column name to hold the source column values (qty). Then, following the FOR keyword, you specify the target column name to hold the source column names (orderyear). Finally, in the parentheses of the IN clause, you specify the source column names that you want to unpivot ([2006],[2007],[2008]).