Understanding calculated columns and measures
Now that you know the basics of DAX syntax, you need to learn one of the most important concepts in DAX: the difference between calculated columns and measures. Even though calculated columns and measures might appear similar at first sight because you can make certain calculations using either, they are, in reality, different. Understanding the difference is key to unlocking the power of DAX.
Depending on the tool you are using, you can create a calculated column in different ways. Indeed, the concept remains the same: a calculated column is a new column added to your model, but instead of being loaded from a data source, it is created by resorting to a DAX formula.
A calculated column is just like any other column in a table, and we can use it in rows, columns, filters, or values of a matrix or any other report. We can also use a calculated column to define a relationship, if needed. The DAX expression defined for a calculated column operates in the context of the current row of the table that the calculated column belongs to. Any reference to a column returns the value of that column for the current row. We cannot directly access the values of other rows.
If you are using the default Import Mode of Tabular and are not using DirectQuery, one important concept to remember about calculated columns is that these columns are computed during database processing and then stored in the model. This concept might seem strange if you are accustomed to SQL-computed columns (not persisted), which are evaluated at query time and do not use memory. In Tabular, however, all calculated columns occupy space in memory and are computed during table processing.
This behavior is helpful whenever we create complex calculated columns. The time required to compute complex calculated columns is always process time and not query time, resulting in a better user experience. Nevertheless, be mindful that a calculated column uses precious RAM. For example, if we have a complex formula for a calculated column, we might be tempted to separate the steps of computation into different intermediate columns. Although this technique is useful during project development, it is a bad habit in production because each intermediate calculation is stored in RAM and wastes valuable space.
If a model is based on DirectQuery instead, the behavior is hugely different. In DirectQuery mode, calculated columns are computed on the fly when the Tabular engine queries the data source. This might result in heavy queries executed by the data source, therefore producing slow models.
Figure 2-2 By subtracting two dates and converting the result to an integer, DAX computes the number of days between the two dates.
Calculated columns are useful, but you can define calculations in a DAX model in another way. Whenever you do not want to compute values for each row but rather want to aggregate values from many rows in a table, you will find these calculations useful; they are called measures.
For example, you can define a few calculated columns in the Sales table to compute the gross margin amount:
Sales[SalesAmount] = Sales[Quantity] * Sales[Net Price] Sales[TotalCost] = Sales[Quantity] * Sales[Unit Cost] Sales[GrossMargin] = Sales[SalesAmount] – Sales[TotalCost]
What happens if you want to show the gross margin as a percentage of the sales amount? You could create a calculated column with the following formula:
Sales[GrossMarginPct] = Sales[GrossMargin] / Sales[SalesAmount]
This formula computes the correct value at the row level—as you can see in Figure 2-3—but at the grand total level the result is clearly wrong.
Figure 2-3 The GrossMarginPct column shows a correct value on each row, but the grand total is incorrect.
The value shown at the grand total level is the sum of the individual percentages computed row by row within the calculated column. When we compute the aggregate value of a percentage, we cannot rely on calculated columns. Instead, we need to compute the percentage based on the sum of individual columns. We must compute the aggregated value as the sum of gross margin divided by the sum of sales amount. In this case, we need to compute the ratio on the aggregates; you cannot use an aggregation of calculated columns. In other words, we compute the ratio of the sums, not the sum of the ratios.
It would be equally wrong to simply change the aggregation of the GrossMarginPct column to an average and rely on the result because doing so would provide an incorrect evaluation of the percentage, not considering the differences between amounts. The result of this averaged value is visible in Figure 2-4, and you can easily check that (330.31 / 732.23) is not equal to the value displayed, 45.96%; it should be 45.11% instead.
Figure 2-4 Changing the aggregation method to AVERAGE does not provide the correct result.
The correct implementation for GrossMarginPct is with a measure:
GrossMarginPct := SUM ( Sales[GrossMargin] ) / SUM (Sales[SalesAmount] )
As we have already stated, the correct result cannot be achieved with a calculated column. If you need to operate on aggregated values instead of operating on a row-by-row basis, you must create measures. You might have noticed that we used := to define a measure instead of the equal sign (=). This is a standard we used throughout the book to make it easier to differentiate between measures and calculated columns in code.
After you define GrossMarginPct as a measure, the result is correct, as you can see in Figure 2-5.
Figure 2-5 GrossMarginPct defined as a measure shows the correct grand total.
Measures and calculated columns both use DAX expressions; the difference is the context of evaluation. A measure is evaluated in the context of a visual element or in the context of a DAX query. However, a calculated column is computed at the row level of the table it belongs to. The context of the visual element (later in the book, you will learn that this is a filter context) depends on user selections in the report or on the format of the DAX query. Therefore, when using SUM(Sales[SalesAmount]) in a measure, we mean the sum of all the rows that are aggregated under a visualization. However, when we use Sales[SalesAmount] in a calculated column, we mean the value of the SalesAmount column in the current row.
A measure needs to be defined in a table. This is one of the requirements of the DAX language. However, the measure does not really belong to the table. Indeed, we can move a measure from one table to another table without losing its functionality.
Choosing between calculated columns and measures
Now that you have seen the difference between calculated columns and measures, it is useful to discuss when to use one over the other. Sometimes either is an option, but in most situations, the computation requirements determine the choice.
As a developer, you must define a calculated column whenever you want to do the following:
Place the calculated results in a slicer or see results in rows or columns in a matrix or in a pivot table (as opposed to the Values area), or use the calculated column as a filter condition in a DAX query.
Define an expression that is strictly bound to the current row. For example, Price * Quantity cannot work on an average or on a sum of those two columns.
Categorize text or numbers. For example, a range of values for a measure, a range of ages of customers, such as 0–18, 18–25, and so on. These categories are often used as filters or to slice and dice values.
However, it is mandatory to define a measure whenever one wants to display calculation values that reflect user selections, and the values need to be presented as aggregates in a report, for example:
To calculate the profit percentage of a report selection
To calculate ratios of a product compared to all products but keep the filter both by year and by region
We can express many calculations both with calculated columns and with measures, although we need to use different DAX expressions for each. For example, one can define the GrossMargin as a calculated column:
Sales[GrossMargin] = Sales[SalesAmount] - Sales[TotalProductCost]
However, it can also be defined as a measure:
GrossMargin := SUM ( Sales[SalesAmount] ) - SUM ( Sales[TotalProductCost] )
We suggest you use a measure in this case because, being evaluated at query time, it does not consume memory and disk space. As a rule, whenever you can express a calculation both ways, measures are the preferred way to go. You should limit the use of calculated columns to the few cases where they are strictly needed. Users with Excel experience typically prefer calculated columns over measures because calculated columns closely resemble the way of performing calculations in Excel. Nevertheless, the best way to compute a value in DAX is through a measure.