Model the data
- By Daniil Maslyuk
In this sample chapter from Exam Ref PL-300 Microsoft Power BI Data Analyst, you will explore the skills necessary to design, develop, and optimize data models.
In the previous chapter, we reviewed the skills necessary to get and transform data by using Power Query Editor—the process also known as data shaping. In this chapter, we examine the skills needed to model data.
Although Power BI allows you to analyze your data to some degree right after you load it, a strong understanding of data modeling allows you to perform sophisticated analysis using rich data modeling capabilities, which includes creating relationships, hierarchies, and various calculations to bring out the true power of Power BI. Previously in the Power Query Editor we used the M language; after we load the data into the model, we use data analysis expressions, more commonly referred to as DAX—Power BI’s native query language.
In this chapter, we review the skills necessary to design, develop, and optimize data models. Additionally, we look at DAX and how it can be used to enhance data models.
Skills covered in this chapter:
Skill 2.1: Design a data model
Skill 2.2: Develop a data model
Skill 2.3: Create model calculations by using DAX
Skill 2.4: Optimize model performance
Skill 2.1: Design a data model
A proper data model is the foundation of meaningful analysis. A Power BI data model is a collection of one or more tables and, optionally, relationships. A well-designed data model enables business users to understand and explore their data and derive insights from it. This step should be taken before you create any visuals by loading your data and defining the relationships between tables. Data modeling often occurs at the beginning phase of building a Power BI report so that you can create efficient measures that build upon your data model. In this section, we design a data model by focusing our attention on tables and their relationships.
Define the tables and design a data model that uses a star schema
Once a query is loaded, it becomes a table in a Power BI data model. Tables can then be organized into different data model types, also known as schemas. The three most common schemas in Power BI are:
Flat (fully denormalized) schema
There are other types of data models, though these three are the most common ones.
In the flat type of data model, all attributes are fully denormalized into a single table. Because there’s only one table, there are no relationships, and in most cases there’s no need for key.
In our Wide World Importers example, we have a single table that contains all columns from all tables, meaning that the Sale and Targets columns will be in the same table. Because the tables have different data granularity, you run into problems when comparing actuals and targets.
From the performance point of view, flat schemas are very efficient, though there are downsides:
A single table can be cumbersome and confusing to navigate.
Columns and data can often be duplicated, leading to a comparatively large file size.
Mixing facts of different grains results in more complex DAX formulas.
Flat schemas are often used when connecting to a single, simple source. However, for more complex data models, flat schemas should be avoided in Power BI as much as possible.
When you use a star schema, tables are conceptually classified into two kinds:
Fact tables These tables contain the metrics you want to aggregate. Fact tables have foreign keys, which are required in order to create relationships with dimensions, and columns that you can aggregate. In our Wide World Importers example, the Sale and Targets tables are fact tables. Fact tables are sometimes also known as data tables.
Dimension tables These tables contain the descriptive attributes that help you slice and dice your fact tables. A dimension table has a unique identifier—a key column—and descriptive columns. In our Wide World Importers example, the City, Customer, Date, Employee, and Stock Item are dimension tables. Dimension tables are also sometimes known as lookup tables.
In a star schema, fact tables are surrounded by dimensions, as shown in Figure 2-1.
FIGURE 2.1 Star schema with Sale as the only fact table
The star schema has its name because it resembles a star, with the fact table in the center and dimension tables as the star points. It’s possible to have more than one fact table in a star schema, and it will still be a star schema.
In most cases, the star schema is the preferred data modeling approach in Power BI. It addresses the shortcomings of the flat schema:
Fields are logically grouped, making the model easier to understand.
There is less duplication of data, which results in more efficient storage.
You don’t need to write overly complex DAX formulas to work with fact tables that have a different grain.
The snowflake schema is similar to the star schema, except it can have some dimensions that “snowflake” from other dimensions. You can see an example in Figure 2-2.
FIGURE 2.2 Snowflake schema with State Province snowflaking from the City table
In the Wide World Importers example, if we loaded the State Province query, the data model could be a snowflake schema. This is because the State Province table is related to the City dimension table, which in turn is related to the Sale fact table.
Snowflake schemas can be beneficial when there are fact tables that have different grains.
Configure table and column properties
Both tables and columns have various properties you can configure, and you can do it in the Model view. To see the properties of a column or a table, select an object, and you will see its properties in the Properties pane.
For tables, depending on the storage mode, you can configure the following properties:
Name Enter the table name.
Description This property allows you to add a description of the table that will be stored in the model’s metadata. It can be useful when building reports because you can see the description when you hover over the table in the Fields pane.
Synonyms These are useful for the Q&A feature of Power BI, which we review in the next skill section. You can add synonyms so that the Q&A feature can understand that you’re referring to a specific table even if you provide a different name for it.
Row label This property is useful for both Q&A and featured tables, and it allows you to select a column whose values will serve as labels for each row. For example, if you ask Q&A to show “sales amount by product” and you select the Product Name column as the Row label of the Product table, then Q&A will show sales amount for each product name.
Key column If your table has a column that has unique values for every row, you can set that column as the key column.
Is hidden You can hide a table so that it disappears from the Fields pane.
Is featured table This property allows you to make a table featured, which will allow it to be used in Excel in certain scenarios.
Storage mode This property may be set to Import, DirectQuery, or Dual, as we covered in the previous chapter.
For columns, depending on data type, you can configure the following properties:
Name Enter the column name.
Description As you can for tables, you can add a column description.
Synonyms As you can for tables, you can add synonyms to make the column work better with Q&A.
Display folder You can group columns from the same table into display folders.
Is hidden Hiding a column keeps it in the data model and hides it in the Fields pane.
Data type The available data types are different from those available in Power Query. For instance, Percentage, Date/Time/Timezone, and Duration are not available.
Format Different data types will show different formatting properties. For example, for numeric columns, you'll see the following additional properties: Percentage format, Thousands separator, and Decimal places.
Sort by column You can sort one column by another. For example, you can sort month names by month numbers to make them appear in the correct order.
Data category This property can be useful for some visuals, and the default is Uncategorized. Depending on the data type, you can also select one of the following:
State or Province
Summarize by This property determines how the column will be aggregated if you put it into a visual. The options you can choose depend on the data type. For most data types, in addition to Don’t Summarize/None, you can choose Count and Count (Distinct)/Distinct Count, whereas for numeric columns, you can also choose Sum, Average, Minimum/Min, and Maximum/Max. While Power BI will try to automatically determine the appropriate summarization, it’s not always accurate.
Is nullable—You may disallow null values for a column; if during data refresh, a column is determined to get a null value, the refresh will fail.
Design and implement role-playing dimensions
In some cases, there may be more than one way to filter a fact table by a dimension. In the Wide World Importers example, the Sale table has two date columns: Invoice Date Key and Delivery Date Key, both of which can be related to the Date column from the Date table. Therefore, it’s possible to analyze sales by invoice date or delivery date, depending on the business requirements. In this situation, the Date dimension is a role-playing dimension.
While Power BI allows you to have multiple physical relationships between two tables, no more than one can be active at a time, and other relationships must be set as inactive. Active relationships, by default, propagate filters. The choice of which relationship should be set as active depends on the default way of looking at data by the business.
To create a relationship between two tables, you can drag a key from one table on top of the corresponding key from the other table in the Model view.
In our Wide World Importers example, you can drag the Date column from the Date table on top of the Invoice Date Key column in the Sale table. This will create an active relationship, signified by the solid line. Next, you can drag the Date column from the Date table on top of the Delivery Date Key column from the Sale table. This will create an inactive relationship, signified by the dashed line. The result should look like Figure 2-3.
FIGURE 2.3 Relationships between Sale and Date
If you hover over a relationship line in the Model view, it’ll highlight the fields that participate in the relationship.
In our Wide World Importers model, you should also create the relationships listed in Table 2-1.
TABLE 2-1 Additional relationships in Wide World Importers
FROM: TABLE (COLUMN)
TO: TABLE (COLUMN)
Sale (City Key)
City (City Key)
Sale (Customer Key)
Customer (Customer Key)
Sale (Salesperson Key)
Employee (Employee Key)
Sale (Stock Item Key)
Stock Item (Stock Item Key)
Inactive relationships can be activated by using the USERELATIONSHIP function in DAX, which also deactivates the default active relationship, if any. The following is an example of a measure that uses USERELATIONSHIP:
Revenue by Delivery Date = CALCULATE( [Revenue], USERELATIONSHIP( 'Date'[Date], Sale[Delivery Date Key] ) )
To use USERELATIONSHIP, you need to define a relationship in the model first so that the function only works for existing relationships. This approach is useful for scenarios such as the Wide World Importers example, where we have multiple date columns within the same fact table.
If you have a number of measures that you want to analyze by using different relationships, this may result in your data model having many similar measures, cluttering your data model to a degree.
Another drawback of using USERELATIONSHIP is that you cannot analyze data by using two relationships at the same time. For instance, if you have a single Date table, it won’t be possible to see which sales were invoiced last year and shipped this year.
An alternative to USERELATIONSHIP that addresses these drawbacks is to use separate dimensions for each role or relationship. In Wide World Importers, you would have Delivery Date and Invoice Date dimensions, which would make it possible to analyze sales by both delivery and invoice dates.
There are a few ways to create the new dimensions based on the existing Date table, one of which is to use calculated tables. For the Invoice Date table, the DAX formula would be as follows:
Invoice Date = 'Date'
The benefit of using calculated tables instead of referencing or duplicating queries in Power Query is that if you have calculated columns in your Date table, they will be copied in a calculated table, while you’ll need to re-create the same columns if you use Power Query to create the copies of the dimension.
When you’re creating separate dimensions, it’s best to rename the columns to make it clear where fields are coming from. For example, instead of leaving the column called Date, it’s best to rename it to Invoice Date. You can do so by right-clicking a field in the Fields pane and selecting Rename or by double-clicking a field. Alternatively, you can rename fields by using a more complex calculated table expression. For example, you could use the SELECTCOLUMNS function in DAX to rename columns.
Define a relationship’s cardinality and cross-filter direction
In the previous section, we looked at how you create relationships between tables. In this section, we review the concepts of cardinality and cross-filter direction of relationships.
You can edit a relationship by double-clicking it in the Model view. For example, in Figure 2-4 you can see the options for one of the relationships between the Sale and Date tables.
FIGURE 2.4 Relationship options
In the relationship options, you can select tables from drop-down lists. You get a preview of each table, from which you can select a column that will be part of a relationship. Unlike in the Merge operation in Power Query, only one column from each table can be part of a relationship.
The Make this relationship active check box determines whether the relationship is active. Between two tables, there can be no more than one active relationship.
When you’re using DirectQuery, the Assume referential integrity option is available, and it can improve query performance in certain cases.
Two options are worth reviewing in more detail: Cardinality and Cross filter direction.
Depending on the selected tables and columns, you can select one of the following options:
Many to one
One to one
One to many
Many to many
Many to one and One to many are the same kind of relationship, and they differ only in the order in which the tables are listed. “Many” means that a key may appear more than once in the selected column, whereas “One” means a key value appears only once in the selected column. In our Wide World Importers example earlier, the Sale table was on the many side, whereas the Date table was on the one side; a single date appeared only once in the Date table, though there could be multiple sales on the same date in the Sale table.
One to one is a special kind of relationship where a key value appears only once on both sides of the relationship. This type of relationship may be useful for splitting a single dimension with many columns into separate tables. You should only use this if you are confident that no duplicates will appear in this table, because duplicates will cause immediate errors in your data model.
Many-to-many relationships in this context refer to a direct relationship between two tables, neither of which is guaranteed to have unique keys. We review this type of relationship later in this chapter.
Cross filter direction
This option determines the direction in which filters flow. For many-to-one and one-to-many relationships, you can select Single or Both:
If you select Single, then the filters from the table on the “one” side will filter through to the table on the “many” side. This setting is signified by a single arrowhead on the relationship line in the Model view.
If you select Both, then filters from both tables will flow in both directions; such relationship are known as bidirectional. This setting is signified by two arrowheads on the relationship line in the Model view, facing in opposite directions. When this option is selected, you can also select Apply security filter in both directions to make row-level security filters flow in both directions too.
When editing table relationships, even if you set the relationship cross-filter direction to Both, by default the security filters are only applied in one direction. We noted that there’s an option to control security filtering called Apply security filter in both directions. This means that role filtering applied to a table will also be passed to the filtered table. When this option is disabled, only the table with filtered applied will be affected. This option exists because applying security filters affects the performance of your data model, so in some cases applying it may be undesirable.
To illustrate how the cross-filter direction works, consider the data model shown in Figure 2-5.
FIGURE 2.5 Sample data model
From this data model, you can create two table visuals as follows:
Table 1: Distinct count of Stock Item by Year
Table 2: Distinct count of Year by Stock Item
Both table visuals are shown in Figure 2-6. The first four rows are shown for Table 2 for illustrative purposes.
FIGURE 2.6 Table visuals
You can see that in Table 1, the numbers are different for different years and the total, whereas in Table 2, the Distinct Count of Year is showing 6 for all rows, including the Total.
The numbers are different in Table 1 because filters from the Date table can reach the Stock Item table through the Sale table; the Date table filters the Sale table because there is a one-to-many relationship; then the Sale table filters the Stock Item table because there is a bidirectional relationship. In 2019, 2020, and 2021, Wide World Importers coincidentally sold 219 stock items, whereas in 2022, they sold 227 stock items. At the total level you see 228, which is not the total sum of stock items sold across all years. Importantly, the total 228 is showing as the distinct count of stock items when filters from the Date table are not applied.
In Table 2, the numbers are the same because filters from the Stock Item table don’t reach the Date table as there is no bidirectional filter. Even though Wide World Importers only had sales in four years, you see 6 across all rows, which is the number of years in the Date table.
It’s also possible to set the cross-filter direction by using the CROSSFILTER function in DAX, as you can see in this example:
Stock Items Sold = CALCULATE( DISTINCTCOUNT('Stock Item'[Stock Item]), CROSSFILTER( Sale[Stock Item Key], 'Stock Item'[Stock Item Key], BOTH ) )
The syntax of CROSSFILTER is similar to USERELATIONSHIP—the first two parameters are related columns. Additionally, there’s the third parameter—direction—and it can be one of the following:
BOTH This option corresponds to Both in the relationship cross-filter direction options.
NONE This option deactivates the relationship, and it corresponds to the cleared Make this relationship active check box.
ONEWAY This option corresponds to Single in the relationship cross-filter direction options.
Bidirectional filters are sometimes used in many-to-many relationships with bridge tables when direct many-to-many relationships are not desirable.
Create a common date table
By default, Power BI creates a calendar hierarchy for each date or date/time column from your data sources.
While these can be useful in some cases, it’s best practice to create your own date table, which has several benefits:
You can use a calendar other than Gregorian.
You can have weeks in the calendar.
You can filter multiple fact tables by using a single date dimension table.
If you don’t have a date table you can import from a data source, you can create one yourself. It’s possible to create a date table by using Power Query or DAX, and there’s no difference in performance between the two methods.
Creating a calendar table in Power Query
In Power Query, you can use the M language List.Dates function, which returns a list of dates, and then convert the list to a table and add columns to it. The following query provides a sample calendar table that begins on January 1, 2018:
let Source = #date(2018, 1, 1), Dates = List.Dates(Source, Duration.TotalDays(Date.AddYears(Source, 6) - Source), #duration(1,0,0,0)), #"Converted to Table" = Table.FromList(Dates, Splitter.SplitByNothing(), type table [Date = date]), #"Inserted Year" = Table.AddColumn(#"Converted to Table", "Year", each Date. Year([Date]), Int64.Type), #"Inserted Month Name" = Table.AddColumn(#"Inserted Year", "Month Name", each Date. MonthName([Date]), type text), #"Inserted Month" = Table.AddColumn(#"Inserted Month Name", "Month", each Date. Month([Date]), Int64.Type), #"Inserted Week of Year" = Table.AddColumn(#"Inserted Month", "Week of Year", each Date.WeekOfYear([Date]), Int64.Type) in #"Inserted Week of Year"
If you want to add it to your model, you’ll need to start with a blank query:
In Power Query Editor, select New Source on the Home ribbon.
Select Blank Query.
With the new query selected, select Query > Advanced Editor on the Home ribbon.
Replace all existing code with the code above and select Done.
Give your query an appropriate name such as Calendar or Date.
The result should look like Figure 2-7, where the first few rows of the query are shown.
FIGURE 2.7 Sample calendar table built by using Power Query
You may prefer having a table in Power Query when you intend to use it in some other queries, since it’s not possible to reference calculated tables in Power Query.
Creating a calendar table in DAX
If you choose to create a date table in DAX, you can use the CALENDAR or CALENDARAUTO function, both of which return a table with a single Date column. You can then add calculated columns to the table, or you can create a calculated table that already has all the columns.
The CALENDAR function requires you to provide the start and end dates, which you can hardcode for your business requirements or calculate dynamically:
Calendar Dynamic = CALENDAR( MIN(Sale[Invoice Date Key]), MAX(Sale[Invoice Date Key]) )
The CALENDARAUTO function scans your data model for dates and returns an appropriate date range automatically.
To build a table similar to the Power Query table you built earlier, use the following calculated table formula in DAX:
Calendar = ADDCOLUMNS( CALENDARAUTO(), "Year", YEAR([Date]), "Month Name", FORMAT([Date], "MMMM"), "Month", MONTH([Date]), "Week of Year", WEEKNUM([Date]) )