The VertiPaq Engine in DAX

  • 11/3/2015
This chapter from "The Definitive Guide to DAX: Business intelligence with Microsoft Excel, SQL Server Analysis Services, and Power BI" covers the internal architecture of the VertiPaq engine: the in-memory columnar database that stores and hosts your model.

At this point in the book, you have a solid understanding of the DAX language. The next step, apart from the necessary experience that you need to gain by yourself, is not only being able to write DAX, but also to write efficient DAX. Writing efficient DAX, that is, code that runs at its best possible speed, requires you to understand the internals of the engine. The next chapters aim to provide the essential knowledge to measure and improve performance of DAX code.

More specifically, this chapter is dedicated to the internal architecture of the VertiPaq engine: the in-memory columnar database that stores and hosts your model.

Before continuing with the dissertation, it is worth mentioning a quick note. The official name of the engine on top of which DAX runs is “xVelocity in-memory Analytical Engine.” The name appeared later, when the engine was ready to market. During its development, it was code-named “VertiPaq.” Because of the late change in the name, many white papers referred to the engine as the VertiPaq engine, and all the early adopters learned its name as VertiPaq. Moreover, internally, the engine is still known as VertiPaq (in fact, as you learn later, its query engine executes VertiPaq queries, not xVelocity queries). In order to avoid confusion in sentences such as “the xVelocity engine executes a VertiPaq query,” which mixes both names in a single sentence, we decided to use VertiPaq only.

There is another important note to our readers. Starting from this chapter, we somewhat deviate from DAX and begin to discuss some low-level technical details about the implementation of DAX and the VertiPaq engine. Although this is an important topic, you need to be aware of two facts:

  • Implementation details change often. We did our best to show information at a level which is not likely to change soon, carefully balancing detail level and usefulness with consistency over time. The most up-to-date information will always be available in blog posts and articles on the web.
  • All the considerations about the engine and optimization techniques are useful if you rely on the VertiPaq engine. In case you are using DirectQuery, then the content of the last chapters of this book is nearly useless in your specific scenario. However, we suggest that you read and understand it anyway, because it shows many details that will help you in choosing the best engine for your analytical scenario.

Understanding database processing

DAX runs in SQL Server Analysis Services (SSAS) Tabular, Power BI service (both on server and on the local Power BI Desktop), and in the Power Pivot for Microsoft Excel add-in. Technically, Power Pivot for Excel runs a local instance of SSAS Tabular, whereas Power BI uses a separate process running a special instance of Analysis Services. Thus, speaking about different engines is somewhat artificial: Power Pivot is SSAS, even if it runs in a “hidden mode” inside Excel. In this book, we make no differences between these engines and, when we speak about SSAS, you might always mentally replace SSAS with Power Pivot or Power BI. If there are differences worth highlighting, then we will note them in the specific section.

When SSAS loads the content of a source table in memory, we say that it processes the table. This happens during the process operation of SSAS or the data refresh in Power Pivot for Excel and Power BI. During processing, the engine reads the content of your data source and transforms it in the internal VertiPaq data structure.

The steps that happen during processing are as follows:

  1. Reading of the source dataset, transformation into a columnar data structure of VertiPaq, encoding and compressing each column.
  2. Creation of dictionaries and indexes for each column.
  3. Creation of the data structures for relationships.
  4. Computation and compression of all the calculated columns.

The last two steps are not necessarily sequential. In fact, you can create a relationship based on a calculated column, or have calculated columns that depend on a relationship because they use RELATED or CALCULATE; SSAS creates a complex graph of dependencies to execute the steps in the correct order.

In the next sections, you learn many more details about these steps and the format of internal structures created by SSAS during the transformation of the data source into the VertiPaq model.