Microsoft PowerPivot for Excel 2010: Date Calculations in DAX
- Working with a Dates Table
- Aggregating and Comparing over Time
- Closing Balance over Time
Many analyses of data have to deal with dates. Microsoft SQL Server PowerPivot for Excel offers a number of functions that simplify many calculations on dates that are typical in a business scenario, but using the right function in the right way requires some explanation. As you see in this chapter, the first step in date calculations is to create a separate Dates table that supports most of the requirements.
Working with a Dates Table
In some examples in the previous chapters, we defined calculated columns that extracted parts of the date that we used to group dates, such as the year and the month. This technique might be applied to each table containing a date, but it would quickly become hard to manage. It is better to create a separate table containing a row for each date, using the date as a key to link that Dates table with other tables that contain data related to a date. In this way, you obtain a model wherein all attributes about dates are included in a separate table and are easy to access when you browse data with a PivotTable, as you can see in Figure 7-1.
Figure 7-1 PivotTable browsing Order data by using a Dates table named OrderDate.
A Dates table is also useful for making calculations using special DAX functions that operate on Dates. These functions, of which DATEADD is an example, often require that all the days in a given range exist in the data table—otherwise, a missing day might result in a wrong calculation. You might have no sales for a day (in fact, it is pretty common to have no sales on nonworking days), so the separate Dates table allows you to make the right calculations without requiring any modification of the original table that contains measures to analyze.
The only side effect of this technique is that you need to create a Dates table in PowerPivot for each date attribute you want to analyze in a single table because there can be only one relationship between two tables in PowerPivot.
How to Build a Dates Table
To create a Dates table in PowerPivot, you need a data source that contains at least a column with all days included in the period of time you want to analyze. For example, if the minimum and maximum date contained in Sales data is July 3, 2001 and July 27, 2004, respectively, the range of dates you should consider is between January 1, 2001 and December 31, 2004. In this way, you have all the days for all the years containing sales data.
In Chapter 3, “Introduction to DAX,” you saw how to create Day, Month, and Year calculated columns for a Calendar table that has just the Date column as existing data. However, if you do not have an external source providing you with a valid Dates table (such as a corresponding table in SQL Server), we suggest that you create all the calculated columns for a Dates table in Excel. In this way, it will be easier to copy and paste the entire contents of that table into a new one when you have to handle more dates—for example, Order Date and Ship Date—in your PowerPivot model.
To create your Dates table, you can start by typing Date in a cell and 1/1/2001 in the cell below it, as you can see in Figure 7-2.
Figure 7-2 Creating a Dates table in Excel.
Then in the bottom-right corner of the cell containing the 1/1/2001 date (which is highlighted in Figure 7-2), you can drag down until you reach the date of 12/31/2004, as you can see in Figure 7-3.
Figure 7-3 Selecting end range for creating a Dates table in Excel.
At this point, you can release the mouse. You just created a list of all the days included from the beginning of 2001 to the end of 2004. Now you can click the Date cell, click the Format As Table button on the Home tab of the ribbon, and then confirm that your table has headers, as you can see in Figure 7-4.
Figure 7-4 Confirming the range of the Table and confirming that your table has headers.
In Figure 7-5, you can see how to give the Calendar name to the table by using the text box available on the Design tab of the Table Tools contextual tab of the Excel ribbon and how to start adding new columns by right-clicking a cell in the table and selecting the InsertTable Column To The Right item from the Insert context menu.
In Figure 7-6, you can see how to define a formula in an empty cell of the new column to calculate the Year. After you type =YEAR(, you can click the Date column to get the right syntax to read that column, as shown in Figure 7-6.
Figure 7-5 Inserting a new column in the Dates table.
Figure 7-6 Defining the formula for the Year column in Excel.
At this point, you can type the closing parenthesis and then press Enter. The formula is automatically copied for all the rows of the table in the same column, with the result that you can see in Figure 7-7 (after you adjust the format of the Year column to General in case it was a different format that you copied from the Date column).
Figure 7-7 The Year column calculated for all the rows.
With this technique, you can define all the columns that are useful for navigating the data that aggregate date in several ways.
Figure 7-8 shows the final result of a complete Dates table with fiscal year starting on July 1. You can find this table in the CH07-01-Calendar.xlsx workbook included on the companion DVD. Table 7-1 contains the formula definitions for all of the columns.
Figure 7-8 A complete Dates table with fiscal year starting on July 1.
Table 7-1 Formula definitions for the Dates table in Excel.
=TEXT([@Date],“MM - mmmm”)
=“Q” & ROUNDUP(MONTH([@Date]) /3,0)
=“FY-” & [@Year]+IF([@Month]<7,0,1)
Now you can import this table in PowerPivot as a linked table. The result is shown in Figure 7-9.
Figure 7-9 The Dates table imported in PowerPivot as a linked table.
You can see that the month name contains the month number in front of it, so December is described as 12 – December. It is useful to have the month names automatically sorted. However, if you want to sort month names but also want to avoid the initial number, please take a look at the section “Custom Sorting in PivotTables” in Chapter 8, “Mastering PivotTables,” where we describe how to sort columns of a Dates table in a PivotTable.
You might want to change the data types of some columns in the Dates table. Whenever you import the Excel table into PowerPivot, columns like Year, MonthNumber, and Day are usually defined as Whole Number data types. For this reason, when you select one of these columns in the PivotTable, the selected attribute is placed by default in the Values area of the PivotTable and is aggregated when you use the Sum function. You might prefer to change the data types of these columns to Text so that by default they are used to group data in rows.
If you want to test your new Calendar table, you should now import the SalesOrderHeader, SalesOrderDetail, Customer, and Product tables from the AdventureWorks database into the same PowerPivot model. Relationships between these tables are automatically detected during the import. At this point, you need to create a relationship between the OrderDate field of the SalesOrderHeader table and the Date field of the Calendar table you just imported. Before starting, in PowerPivot you have to rename the Calendar table to OrderDate so that it expresses the dates it represents. Then you click the Create Relationship button on the Design tab of the ribbon and fill in the dialog box, as shown in Figure 7-10.
Figure 7-10 Create a relationship between the SalesOrderHeader and OrderDate tables.
At this point, the model is ready to browse data, as you saw at the beginning of this chapter, in Figure 7-1.
Working with Multiple Dates Tables
In the model you saw in the previous section, each Order has several dates. In case you want to analyze not only the Order Date but also the Ship Date, you need to define a second table in PowerPivot because the same table (that is, the Dates table) cannot have more than one relationship with a given table (SalesOrderHeader).
At this point, you have two options. You can either create a new linked table starting from the same table you used before (shown in Figure 7-8) or copy that table into Excel and create the linked table starting from this copy. The first option is not the best one because in PowerPivot you can have only one linked table for a given Excel table. If you try to create a linked table starting from the same Calendar table you defined before, the warning message shown in Figure 7-11 appears.
Figure 7-11 A warning against trying to create a linked table for an Excel table already used as a linked table.
If you continue creating a linked table this way, you cannot update the OrderDate table anymore. If you create a model that must be refreshed over time and that is likely to have a life cycle longer than the current year, you are better off using another way, which allows future updates.
The second option requires you to copy and paste the existing Calendar table in Excel. Before you do that, you should rename the Calendar table in Excel, using the same name we used for the corresponding linked table in PowerPivot, which is OrderDate. To do that, you can type the OrderDate name into the Table Name text box available on the Design tab of the Table Tools contextual tab of the Excel ribbon, as you can see in Figure 7-12.
Figure 7-12 Renaming the table OrderDate, in Excel.
At this point, if you try to Update the OrderDate linked table in PowerPivot, you get the error message shown in Figure 7-13.
Figure 7-13 The error message you get when you try to update OrderDate after changing the name of the underlying Excel table.
When you click the Options button, you can select the Change Excel Table Name option. Then you choose the OrderDate table in the combo box that shows the available tables in Excel, as you can see in Figure 7-14.
Figure 7-14 Fixing the error in LinkedTable by selecting the correct underlying Excel table.
Now you can copy the OrderDate table in Excel into a new one that we call ShipDate. You might do this by selecting the whole table, copying it, and then pasting it into an empty space of your Excel workbook. However, another option is to use a single dedicated Excel worksheet for each table like these so that you can simply duplicate the worksheet into a new one. In this way, whenever you need to add columns or rows to the table, you never have to move other existing tables. Moreover, tables are easily accessible when you click the corresponding worksheet name in Excel.
To create a copy of the worksheet containing the OrderDate table, you have to right-click the OrderDate label and select Move Or Copy from the context menu that you can see in Figure 7-15.
Figure 7-15 Choosing Move Or Copy from the context menu.
The selection displays the dialog box shown in Figure 7-16, in which you have to select the Create A Copy check box and choose the position of the new sheet.
Figure 7-16 Selecting the option to create a copy of the worksheet to place at the end of the list.
At this point, you rename both the table (using the same procedure you saw already in Figure 7-12) and the worksheet (by right-clicking on the OrderDate (2) label and then selecting Rename from the context menu that you can see in Figure 7-17); you use the new ShipDate name.
Figure 7-17 The Rename option in the context menu.
Finally, you can create a linked table for the ShipDate table by clicking the Create Linked Table button on the PowerPivot ribbon. Again, you have to create a relationship in PowerPivot between the SalesOrderHeader and ShipDate tables, by using the ShipDate column of the SalesOrderHeader table this time, as you can see in Figure 7-18.
Figure 7-18 Creating a relationship between the SalesOrderHeader and OrderDate tables.
You can find the resulting model in the CH07-02-OrderAndShippingDate.xlsx workbook included on the companion DVD. However, as you can see in the next section, duplicating tables might not be enough. Because Excel does not show the table name to which a column belongs when you use it for Slicers and Filters, you might want to add a prefix to your columns. It is better to do that directly in the source Excel table rather than renaming the columns in PowerPivot only so that the overall model is simpler to understand.
Differentiating Columns in Multiple Dates Tables
Duplicating the same table, such as a Dates table, multiple times in a PowerPivot model makes the resulting PivotTable difficult to read whenever the same attributes are used from different tables. For example, in Figure 7-19, you can see a PivotTable in which the Year from OrderDate has been put in rows and in the first slicer, and the Year from ShipDate has been put in columns and in the second slicer. The problem is that there is no evidence of the table that a column belongs to whenever it is moved into slicers, filters, rows, or columns of the PivotTable. The final model for the example of this section is available in the CH07-03-PrefixedDateColumns.xlsx workbook included on the companion DVD.
Figure 7-19 Columns with the same name from different tables are not recognizable in a PivotTable.
So in case you create a model with multiple copies of the same tables, you should differentiate the names of the columns so that they are immediately recognizable in a report. You can edit the table names in Excel by adding a prefix to each column. In Figure 7-20, you can see the heading of the OrderDate table, wherein each column has been prefixed with the word Order. You can do the same for the ShipDate column by using the Ship prefix.
Figure 7-20 The columns of OrderDate prefixed with Order.
After you rename your column, you can create a report similar to the one you saw in Figure 7-19, but this time, use more meaningful names for columns that were ambiguous before. You can see the result of such a process in Figure 7-23.
Figure 7-23 Column prefixes are more recognizable both in the slicers and in the PowerPivot Field List.
We suggest that you use column prefixes every time you have the same column name in different tables—not just for Dates tables.
Calculating Working Days
Now that you have learned how to create a calendar table, it is worth pointing out some columns that can be very useful in data analysis and that can be conveniently stored in the calendar table. For example, you might be interested in defining a measure that calculates the average of sales per working days in a given period. (You can find the complete example in the CH07-04-WorkingDays.xlsx workbook included on the companion DVD.) To do that, you have to calculate the number of working days, which in turn requires knowing whether a day is a working day. The simpler way to do this is to add a WorkingDays column to the Excel OrderDate table. That column should have the value 1 for working days, and 0 for holidays, weekends, and other nonworking days. Instead of compiling this column by hand, you might define it by using the following Excel formula that assigns 1 to all week days between Monday and Friday, leaving 0 to Saturday and Sunday:
= IF( WEEKDAY([@Date],2) > 5, 0, 1 )
This formula is automatically copied into all the rows of the OrderDate table, as you can see in Figure 7-24.
Figure 7-24 The WorkingDays column added to the OrderDate table in Excel.
You can modify single values for other nonworking days, such as public holidays, overriding the formula with a forced fixed value (usually 0) just for these days. For example, in Figure 7-25, you can see the value for January 1, 2001 overridden by a 0 value, whereas the following dates are still evaluated by the formula we defined before.
Figure 7-25 The value 0 overriding the formula for January 1, 2001.
You can update the OrderDate linked table in PowerPivot, and the WorkingDays column shows up in the PivotTable too. At this point, you can define a measure belonging to the SalesOrderDetail table named DailySales, which divides the sum of LineTotal by the sum of working days, as you can see in Figure 7-26.
Figure 7-26 The definition of DailySales measure.
The final result is shown in Figure 7-27, where both WorkingDays and DailySales measures are exposed in the PivotTable. However, in a real report, you usually do not show the working days number but just the average measures, such as Daily Sales.
Figure 7-27 PivotTable showing results for WorkingDays and DailySales.
Our technique up to now is really error prone because we write directly into a cell a value of 0 to indicate a holiday, without any further explanation. If we make an error, it is really hard to identify; furthermore, we make no distinction between weekend days (which are automatically calculated) and holidays. A better solution is to define a separate Holidays table, which is easier to check and to maintain because it moves into a single calculated column the logic to merge weekend evaluation and holiday definition using a single formula. In Figure 7-28, you can see such a Holidays table, defined in Excel.
Figure 7-28 Holidays table in Excel.
You can import this Holidays table as a linked table in PowerPivot and define a relationship between the OrderDate and Holidays tables, as shown in Figure 7-29.
Figure 7-29 The relationship between the OrderDate and Holidays tables.
You can remove the WorkingDays column because you are moving the whole logic into PowerPivot. Because you need to use the RELATE function to get holiday information in PowerPivot, you should move all the business logic into one simple place: avoid splitting it half and half between Excel and and PowerPivot. After you update the OrderDate table in PowerPivot by removing the WorkingDays data column, you can define a new WorkingDays calculated column by using the DAX formula that you can see in Figure 7-30.
Figure 7-30 The WorkingDays calculated column in the OrderDate table.
Let us examine the DAX formula in Figure 7-30. First of all, you can see a new Holiday calculated column defined by the following formula:
Holiday =IF( RELATED(HolidaysTable[Date]) = BLANK(), FALSE, TRUE )
The Holiday column has a TRUE value for every day that corresponds to a holiday in the Holidays table. Using this information, we extend the previously defined Excel formula that considers whether a nonworking day is a Saturday, a Sunday, or a holiday by using the following DAX formula:
WorkingDays = IF( WEEKDAY( OrderDate[Date], 2 ) > 5 || OrderDate[Holiday], 0, 1)
Finally, you can browse data with the right calculation of WorkingDays, according to the Holidays table we included in the model. In Figure 7-31, you can see the resulting PivotTable, which you can find along with the complete model in the CH07-05-WorkingDays-HolidaysTable.xlsx workbook included on the companion DVD.
Figure 7-31 The PivotTable showing final results using the Holidays table support.
Another common calculation involving working days is the delta between two dates. For example, in the SalesOrderHeader table of the model used in this chapter, there are three dates, which you can also see in Figure 7-32:
OrderDate: The date of the order
DueDate: When the customer expects the order to be delivered
ShipDate: The date of order shipment
Figure 7-32 The Dates column in the SalesOrderHeader table.
Calculating whether an order has been shipped on time seems pretty easy: you should just compare the DueDate and ShipDate columns. However, if you consider a standard delivery time of four working days, you should calculate how many orders have been shipped after DueDate minus four working days. This calculation requires the support of the Dates table. The complete model of the following example is available in the CH07-06-DeliveryDays.xlsx workbook included on the companion DVD.
To make the calculation, we need to add a calculated column in the SalesOrderHeader table that calculates for each order the difference (in working dates) between the two dates. You can create a WorkingDayNumber calculated column in the Dates table that has the following formula:
WorkingDayNumber =SUMX( FILTER( OrderDate, OrderDate[Date] <= EARLIER(OrderDate[Date]) ), OrderDate[WorkingDays] )
This number calculates for each day the number of working days elapsed since the first date in the Dates table. In Figure 7-33, you can see how this number is calculated for a few rows.
Figure 7-33 The WorkingDayNumber calculation.
At this point, you can define the number of working days between two dates using the difference of WorkingDayNumber for the correspondent dates. Because you might not want to add too many tables to the PowerPivot model, you may reuse the same OrderDate table already imported into the model to get the WorkingDayNumber for both DueDate and ShipDate dates of an order. For example, this number for ShipDate can be obtained by using the following DAX expression:
CALCULATE( VALUES( OrderDate[WorkingDayNumber] ), FILTER( OrderDate, OrderDate[Date] = SalesOrderHeader[ShipDate] ) )
The FILTER call filters only the ShipDate row in the OrderDate table. Using this filter, the CALCULATE function returns the value of WorkingDayNumber for that row. The use of VALUES grants that an error message is raised if the FILTER returns more than one row (in which case, the filter condition contains an error).
So using this DAX expression for both ShipDate and DueDate, we can define a DueDeltaDays calculated column in SalesOrderHeader by using the following formula:
DueDeltaDays = CALCULATE( VALUES( OrderDate[WorkingDayNumber] ), FILTER( OrderDate, OrderDate[Date] = SalesOrderHeader[ShipDate])) + 4 - CALCULATE( VALUES( OrderDate[WorkingDayNumber] ), FILTER( OrderDate, OrderDate[Date] = SalesOrderHeader[DueDate]))
The DueDeltaDays column shows a positive number in the case of a delay, representing the number of delay days. Negative numbers indicate an early delivery (measured always in days). In Figure 7-34, you can see values for this column and for another calculated column named DeliveryDelayDays, which displays a value only for delayed orders.
Figure 7-34 The DueDeltaDays and DeliveryDelayDays calculated columns in the SalesOrderHeader.
With this information, you can calculate some measures in the PivotTable, such as the ratio of delayed deliveries:
DeliveryDelayRation = COUNT( SalesOrderHeader[DeliveryDelayDays] ) / COUNTROWS( SalesOrderHeader )
You can also calculate the average delay (in days) for delayed orders, by simply selecting the Summarize By Average item on the DeliveryDelayDays column. In Figure 7-35, you can see a PivotTable displaying both these measures.
Figure 7-35 The DeliveryDelayRatio and average of DeliveryDelayDays columns in PivotTable.