Microsoft SQL Server 2012 Internals: Special Storage
SQL Server provides the capability of data compression, a feature introduced in SQL Server 2008 and available in the Enterprise edition only. Compression can reduce the size of your tables by exploiting existing inefficiencies in the actual data. These inefficiencies can be grouped into two general categories.
The first category relates to storage of individual data values when they are stored in columns defined using the maximum possible size. For example, a table might need to define a quantity column as int because occasionally you could be storing values larger than 32,767, which is the maximum smallint value. However, int columns always need 4 bytes, and if most of your quantity values are less than 100, those values could be stored in tinyint columns, which need only 1 byte of storage. The Row Compression feature of SQL Server can compress individual columns of data to use only the actual amount of space required.
The second type of inefficiency in the data storage occurs when the data on a page contains duplicate values or common prefixes across columns and rows. This inefficiency can be minimized by storing the repeating values only once and then referencing those values from other columns. The Page Compression feature of SQL Server can compress the data on a page by maintaining entries containing common prefixes or repeating values. Note that when you choose to apply page compression to a table or index, SQL Server always also applies Row Compression.
SQL Server 2005 SP2 introduced a simple form of compression, which could be applied only to columns defined using the decimal data type. (Keep in mind that the data type numeric is completely equivalent to decimal, and anytime I mention decimal, it also means numeric.) In SQL Server 2005, the option must be enabled at both the database level (using the procedure sp_db_vardecimal_storage_format) and at the table level (using the procedure sp_tableoption). In SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012, all user databases are enabled automatically for the vardecimal storage format, so vardecimal must be enabled only for individual tables. Like data compression, which this section looks at in detail, the vardecimal storage format is available only in SQL Server Enterprise edition.
In SQL Server 2005, when both of these stored procedures were run, decimal data in the tables enabled for vardecimal were stored differently. Rather than be treated as fixed-length data, decimal columns are stored in the variable section of the row and use only the number of bytes required. (Chapter 6 looked at the difference between fixed-length data and variable-length data storage.) In addition to all the table partitions that use the vardecimal format for all decimal data, all indexes on the table automatically use the vardecimal format.
Decimal data values are defined with a precision of between 1 and 38 and, depending on the defined precision, they use between 5 and 17 bytes. Fixed-length decimal data uses the same number of bytes for every row, even if the actual data could fit into far fewer bytes. When a table doesn’t use the vardecimal storage format, every entry in the table consumes the same number of bytes for each defined decimal column, even if the value of a row is 0, NULL, or some value that can be expressed in a smaller number of bytes, such as the number 3. When vardecimal storage format is enabled for a table, the decimal columns in each row use the minimum amount of space required to store the specified value. Of course, as you saw in Chapter 6, every variable-length column has 2 bytes of additional overhead associated with it, but when storing very small values in a column defined as decimal with a large precision, the space saving can more than make up for those additional 2 bytes. For vardecimal storage, both NULLs and zeros are stored as zero-length data and use only the 2 bytes of overhead.
Although SQL Server 2012 supports the vardecimal format, I recommend that you use row compression when you want to reduce the storage space required by your data rows. Both the table option and the database option for enabling vardecimal storage have been deprecated.
You can think of row compression as an extension of the vardecimal storage format. In many situations, SQL Server uses more space than needed to store data values, and without the Row Compression feature, the only control you have is to use a variable-length data type. Any fixed-length data types always use the same amount of space in every row of a table, even if space is wasted.
As mentioned earlier, you can declare a column as type int because occasionally you might need to store values greater than 32,767. An int needs 4 bytes of space, no matter what number is stored, even if the column is NULL. Only character and binary data can be stored in variable-length columns (and, of course, decimal, when that option is enabled). Row compression allows integer values to use only the amount of storage space required, with the minimum being 1 byte. A value of 100 needs only a single byte for storage, and a value of 1,000 needs 2 bytes. The storage engine also includes an optimization that allows zero and NULL to use no storage space for the data itself.
Later, this section provides the details about compressed data storage. Starting in SQL Server 2008 R2, row compression also can compress Unicode data. Rather than each Unicode character always be stored in two bytes, if the character needs only a single byte, it is stored only in a single byte.
Enabling row compression
You can enable compression when creating a table or index, or when using the ALTER TABLE or ALTER INDEX command. Also, if the table or index is partitioned, you can choose to compress just a subset of the partitions. (You’ll look at partitioning later in this chapter.)
The script in Listing 8-6 creates two copies of the dbo.Employees table in the AdventureWorks2012 database. When storing row-compressed data, SQL Server treats values that can be stored in 8 bytes or fewer (that is, short columns) differently than it stores data that needs more than 8 bytes (long columns). For this reason, the script updates one of the rows in the new tables so that none of the columns in that row contains more than 8 bytes. The Employees_rowcompressed table is then enabled for row compression, and the Employees_uncompressed table is left uncompressed. A metadata query examining pages allocated to each table is executed against each table so that you can compare the sizes before and after row compression.
Listing 8-6 Comparing two tables to show row compression
USE AdventureWorks2012; GO IF OBJECT_ID('Employees_uncompressed') IS NOT NULL DROP TABLE Employees_uncompressed; GO GO SELECT e.BusinessEntityID, NationalIDNumber, JobTitle, BirthDate, MaritalStatus, VacationHours, FirstName, LastName INTO Employees_uncompressed FROM HumanResources.Employee e JOIN Person.Person p ON e.BusinessEntityID = p.BusinessEntityID; GO UPDATE Employees_uncompressed SET NationalIDNumber = '1111', JobTitle = 'Boss', LastName = 'Gato' WHERE FirstName = 'Ken' AND LastName = 'Sánchez'; GO ALTER TABLE dbo.Employees_uncompressed ADD CONSTRAINT EmployeeUn_ID PRIMARY KEY (BusinessEntityID); GO SELECT OBJECT_NAME(object_id) as name, rows, data_pages, data_compression_desc FROM sys.partitions p JOIN sys.allocation_units au ON p.partition_id = au.container_id WHERE object_id = object_id('dbo.Employees_uncompressed'); IF OBJECT_ID('Employees_rowcompressed') IS NOT NULL DROP TABLE Employees_rowcompressed; GO SELECT BusinessEntityID, NationalIDNumber, JobTitle, BirthDate, MaritalStatus, VacationHours, FirstName, LastName INTO Employees_rowcompressed FROM dbo.Employees_uncompressed GO ALTER TABLE dbo.Employees_rowcompressed ADD CONSTRAINT EmployeeR_ID PRIMARY KEY (BusinessEntityID); GO ALTER TABLE dbo.Employees_rowcompressed REBUILD WITH (DATA_COMPRESSION = ROW); GO SELECT OBJECT_NAME(object_id) as name, rows, data_pages, data_compression_desc FROM sys.partitions p JOIN sys.allocation_units au ON p.partition_id = au.container_id WHERE object_id = object_id('dbo.Employees_rowcompressed'); GO
The dbo.Employees_rowcompressed table is referred to again later in this section, or you can examine it on your own as the details of compressed row storage are covered.
Now you can start looking at the details of row compression, but keep these points in mind:
Row compression is available only in SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012 Enterprise and Developer editions.
Row compression doesn’t change the maximum row size of a table or index.
Row compression can’t be enabled on a table with any columns defined as SPARSE.
If a table or index has been partitioned, row compression can be enabled on all the partitions or on a subset of the partitions.
New row format
Chapter 6 looked at the format for storing rows that has been used since SQL Server 7.0 and is still used in SQL Server 2012 if you haven’t enabled compression. That format is referred to as the FixedVar format because it has a fixed-length data section separate from a variable-length data section. A completely new row format was introduced in SQL Server 2008 for storing compressed rows, and this format is referred to as CD format. The term CD, which stands for “column descriptor,” refers to every column having description information contained in the row itself.
You might want to re-examine Figure 6-6 in Chapter 6 as a reminder of what the FixedVar format looks like and compare it to the new CD format. Figure 8-9 shows an abstraction of the CD format. It’s difficult to be as specific as Figure 6-6 is because except for the header, the number of bytes in each region is completely dependent on the data in the row.
Figure 8-9 General structure of a CD record.
Each of these sections is described in detail.
Header The row header is always a single byte and roughly corresponds to what Chapter 6 referred to as Status Bits A. The bits have the following meanings.
Bit 0 This bit indicates the type of record; it’s 1 for the new CD record format.
Bit 1 This bit indicates that the row contains versioning information.
Bits 2 through 4 Taken as a 3-bit value, these bits indicate what kind of information is stored in the row. The possible values are as follows:
000 Primary record
001 Ghost empty record
010 Forwarding record
011 Ghost data record
100 Forwarded record
101 Ghost forwarded record
110 Index record
111 Ghost index record
Bit 5 This bit indicates that the row contains a long data region (with values greater than 8 bytes in length).
Bits 6 and 7 These bits are not used in SQL Server 2012.
The CD region The CD region is composed of two parts. The first part is either 1 or 2 bytes, indicating the number of short columns. If the most significant bit of the first byte is set to 0, it’s a 1-byte field with a maximum value of 127. If a table has more than 127 columns, the most significant bit is 1, and SQL Server uses 2 bytes to represent the number of columns, which can be up to 32,767.
Following the 1 or 2 bytes for the number of columns is the CD array, which uses 4 bits for each column in the table to represent information about the length of the column. Four bits can have 16 different possible values, but in SQL Server 2012, only 13 of them are used.
0 (0×0) indicates that the corresponding column is NULL.
1 (0×1) indicates that the corresponding column is a 0-byte short value.
2 (0×2) indicates that the corresponding column is a 1-byte short value.
3 (0×3) indicates that the corresponding column is a 2-byte short value.
4 (0×4) indicates that the corresponding column is a 3-byte short value.
5 (0×5) indicates that the corresponding column is a 4-byte short value.
6 (0×6) indicates that the corresponding column is a 5-byte short value.
7 (0×7) indicates that the corresponding column is a 6-byte short value.
8 (0×8) indicates that the corresponding column is a 7-byte short value.
9 (0×9) indicates that the corresponding column is an 8-byte short value.
10 (0×a) indicates that the corresponding column is long data value and uses no space in the short data region.
11 (0×b) is used for columns of type bit with the value of 1. The corresponding column takes no space in the short data region.
12 (0×c) indicates that the corresponding column is a 1-byte symbol, representing a value in the page dictionary. (Later, the section “Page compression” talks about the dictionary).
The short data region The short data region doesn’t need to store the length of each short data value because that information is available in the CD region. However, if table has hundreds of columns, accessing the last columns can be expensive. To minimize this cost, columns are grouped into clusters of 30 columns each and at the beginning of the short data region is an area called the short data cluster array. Each array entry is a single-byte integer and indicates the sum of the sizes of all the data in the previous cluster in the short data region, so that the value is basically a pointer to the first column of the cluster. The first cluster of short data starts right after the cluster array, so no cluster offset is needed for it. A cluster might not have 30 data columns, however, because only columns with a length less than or equal to 8 bytes are stored in the short data region.
As an example, consider a row with 64 columns, and columns 5, 10, 15, 20, 25, 30, 40, 50, and 60 are long data, and the others are short. The CD region contains the following.
A single byte containing the value 64, the number of columns in the CD region.
A CD array of 4 * 64 bits, or 32 bytes, containing information about the length of each column. It has 55 entries with values indicating an actual data length for the short data, and 8 entries of 0xa, indicating long data.
The short data region contains the following.
A short data cluster offset array containing the two values, each containing the length of a short data cluster. In this example, the first cluster, which is all the short data in the first 30 columns, has a length of 92, so the 92 in the offset array indicates that the second cluster starts 92 bytes after the first. The number of clusters can be calculated as (Number of columns – 1) /30. The maximum value for any entry in the cluster array is 240, if all 30 columns were short data of 8 bytes in length.
All the short data values.
Figure 8-10 illustrates the CD region and the short data region with sample data for the row described previously. The CD array is shown in its entirety, with a symbol indicating the length of each of the 64 values. (So the depiction of this array can fit on a page of this book, the actual data values aren’t shown.) The first cluster has 24 values in the short data region (6 are long values), the second cluster has 27 (3 are long), and the third cluster has the remaining 4 columns (all short).
Figure 8-10 The CD region and short data region in a CD record.
To locate the entry for a short column value in the short data region, the short data cluster array is first examined to determine the start address of the containing cluster for the column in the short data region.
The long data region Any data in the row longer than 8 bytes is stored in the long data region. This includes complex columns, which don’t contain actual data but instead contain information necessary to locate data stored off the row. This can include large object data and row overflow data pointers. Unlike short data, where the length can be stored simply in the CD array, long data needs an actual offset value to allow SQL Server to determine the location of each value. This offset array looks very similar to the offset array discussed in Chapter 6 for the FixedVar records.
The long data region is composed of three parts: an offset array, a long data cluster array, and the long data.
The offset array is composed of the following.
A 1-byte header in which currently only the first two bits are used Bit 0 indicates whether the long data region contains any 2-byte offset values. Currently, this value is always 1, because all offsets are always 2 bytes. Bit 1 indicates whether the long data region contains any complex columns.
A 2-byte value indicating the number of offsets to follow The most significant bit in the first byte of the offset value indicates whether the corresponding entry in the long data region is a complex column. The rest of the bits/bytes in the array entry store the ending offset value for the corresponding entry in the long data region.
Similar to the cluster array for the short data, the long data cluster array is used to limit the cost of finding columns near the end of a long list of columns. It has one entry for each 30-column cluster (except the last one). Because the offset of each long data column is already stored in the offset array, the cluster array just needs to keep track of how many of the long data values are in each cluster. Each value is a 1-byte integer representing the number of long data columns in that cluster. Just as for the short data cluster, the number of entries in the cluster array can be computed as (Number of columns in the table – 1)/30.
Figure 8-11 illustrates the long data region for the row described previously, with 64 columns, nine of which are long. Values for the offsets aren’t included for space considerations. The long data cluster array has two entries indicating that six of the values are in the first cluster and two are in the second. The remaining values are in the last cluster.
Figure 8-11 The long data region of a CD record.
Special information The end of the row contains three optional pieces of information. The existence of any or all of this information is indicated by bits in the 1-byte header at the very beginning of the row.
Forwarding pointer This value is used when a heap contains a forwarding stub that points to a new location to which the original row has been moved. Chapter 6 discussed forwarding pointers. The forwarding pointer contains three header bytes and an 8-byte row ID.
Back pointer This value is used in a row that has been forwarded to indicate the original location of the row. It’s stored as an 8-byte Row ID.
Versioning info When a row is modified under one of the snapshot-based isolation levels, SQL Server adds 14 bytes of versioning information to the row. Chapter 13 discusses row versioning and Snapshot isolation.
Now look at the actual bytes in two of the rows in the dbo.Employees_rowcompressed table created earlier in Listing 8-6. The DBCC PAGE command gives additional information about compressed rows and pages. In particular, before the bytes for the row are shown, DBCC PAGE displays the CD array. For the first row returned on the first page in the dbo.Employees_rowcompressed table, all the columns contain short data. The row has the data values shown here:
For short data, the CD array contains the actual length of each of the columns, and you can see the following information for the first row in the DBCC PAGE output:
CD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT) CD array entry = Column 2 (cluster 0, CD array offset 0): 0x06 (FIVE_BYTE_SHORT) CD array entry = Column 3 (cluster 0, CD array offset 1): 0x06 (FIVE_BYTE_SHORT) CD array entry = Column 4 (cluster 0, CD array offset 1): 0x04 (THREE_BYTE_SHORT) CD array entry = Column 5 (cluster 0, CD array offset 2): 0x02 (ONE_BYTE_SHORT) CD array entry = Column 6 (cluster 0, CD array offset 2): 0x02 (ONE_BYTE_SHORT) CD array entry = Column 7 (cluster 0, CD array offset 3): 0x04 (THREE_BYTE_SHORT) CD array entry = Column 8 (cluster 0, CD array offset 3): 0x06 (FIVE_BYTE_SHORT)
So the first column has a CD code of 0x02, which indicates a 1-byte value, and, as you can see in the data row, is the integer 1. The second column contains a 5-byte value and is the Unicode string 1111. Notice that compressed Unicode strings are always an odd number of bytes. This is how SQL Server determines that the string has actually been compressed, because an uncompressed Unicode string—which needs 2 bytes for each character—will always be an even number of bytes. Because the Unicode string has an even number of characters, SQL Server adds a single byte 0x01 as a terminator. In Figure 8-12, which shows the DBCC PAGE output for the row contents, you can see that three strings have the 0x01 terminator to make their length odd: ‘1111’, ‘Boss’, and ‘Gato’. I’ll leave it to you to inspect the codes for the remaining columns.
Figure 8-12 A compressed row with eight short data columns.
Now look at a row with some long columns. The 22nd row on the page (Slot 21) has three long columns in the data values shown here:
The CD array for this row looks like the following:
CD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT) CD array entry = Column 2 (cluster 0, CD array offset 0): 0x0a (LONG) CD array entry = Column 3 (cluster 0, CD array offset 1): 0x0a (LONG) CD array entry = Column 4 (cluster 0, CD array offset 1): 0x04 (THREE_BYTE_SHORT) CD array entry = Column 5 (cluster 0, CD array offset 2): 0x02 (ONE_BYTE_SHORT) CD array entry = Column 6 (cluster 0, CD array offset 2): 0x02 (ONE_BYTE_SHORT) CD array entry = Column 7 (cluster 0, CD array offset 3): 0x08 (SEVEN_BYTE_SHORT) CD array entry = Column 8 (cluster 0, CD array offset 3): 0x0a (LONG)
Figure 8-13 shows the bytes that DBCC PAGE returns for this data row. The bytes in the long data region are boldfaced.
Figure 8-13 A compressed row with five short data columns and three long.
Notice the following in the first part of the row, before the long data region.
The first byte in the row is 0x21, indicating that not only is this row in the new CD record format, but also that the row contains a long data region.
The second byte indicates eight columns in the table, just as for the first row.
The following 4 bytes for the CD array has three values of a, which indicate long values not included in the short data region.
The short data values are listed in order after the CD array and are as follows:
The BusinessEntityID is 1 byte, with the value 0x96, or +22.
The Birthdate is 3 bytes.
The MaritalStatus is 1 byte, with the value 0x0053, or ‘S’.
The VacationHours is 1 byte, with the value 0xad, or +45.
The FirstName is 7 bytes, with the value 53617269796110 or ‘Sariya’.
The Long Data Region Offset Array is 8 bytes long, as follows.
The first byte is 0x01, which indicates that the row-offset positions are 2 bytes long.
The second byte is 0x03, which indicates three columns in the long data region.
The next 6 bytes are the 2-byte offsets for each of the three values. Notice that the offset refers to position the column ends with the Long Data area itself.
The first 2-byte offset is 0x0009, which indicates that the first long value is 9 bytes long.
The second 2-byte offset is 001e, or 30, which indicates that the second long value ends 21 bytes after the first. The second value is Marketing Specialist, which is a 21-byte string.
The third 2-byte offset is 0x002f, or 47, which indicates the third value, Harnpadoungsataya, ends 17 bytes after the second long value.
Fewer than 30 columns means no Long Data Cluster Array, but the data values are stored immediately after the Long Data Region Offset Array.
Because of space constraints, this chapter won’t show you the details of a row with multiple column clusters (that is, more than 30 columns), but you should have enough information to start exploring such rows on your own.
In addition to storing rows in a compressed format to minimize the space required, SQL Server 2012 can compress whole pages by isolating and reusing repeating patterns of bytes on the page.
Unlike row compression, page compression is applied only after a page is full, and only if SQL Server determines that compressing the page saves a meaningful amount of space. (You’ll find out what that amount is later in this section.) Keep the following points in mind when planning for page compression.
Page compression is available only in the SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012 Enterprise and Developer editions.
Page compression always includes row compression—that is, if you enable page compression for a table, row compression is automatically enabled.
When compressing a B-tree, only the leaf level can be page compressed. For performance reasons, the node levels are left uncompressed.
If a table or index has been partitioned, page compression can be enabled on all the partitions or on a subset of the partitions.
The code in Listing 8-7 makes another copy of the dbo.Employees table and applies page compression to it. It then captures the page location and linkage information from DBCC IND for the three tables: dbo.Employees_uncompressed, dbo.Employees_rowcompressed, and dbo. Employees_pagecompressed. The code then uses the captured information to report on the number of data pages in each of the three tables.
Listing 8-7 Applying page compression to a table
USE AdventureWorks2012; GO IF OBJECT_ID('Employees_pagecompressed') IS NOT NULL DROP TABLE Employees_pagecompressed; GO SELECT BusinessEntityID, NationalIDNumber, JobTitle, BirthDate, MaritalStatus, VacationHours, FirstName, LastName INTO Employees_pagecompressed FROM dbo.Employees_uncompressed GO ALTER TABLE dbo.Employees_pagecompressed ADD CONSTRAINT EmployeeP_ID PRIMARY KEY (BusinessEntityID); GO ALTER TABLE dbo.Employees_pagecompressed REBUILD WITH (DATA_COMPRESSION = PAGE); GO SELECT OBJECT_NAME(object_id) as name, rows, data_pages, data_compression_desc FROM sys.partitions p JOIN sys.allocation_units au ON p.partition_id = au.container_id WHERE object_id = object_id('dbo.Employees_pagecompressed'); GO SELECT object_name(object_id) as Table_Name, count(*) as Page_Count FROM sys.dm_db_database_page_allocations(db_id('AdventureWorks2012'), null, null, null, 'DETAILED') WHERE object_name(object_id) like ('Employees%compressed') AND page_type_desc = 'DATA_PAGE' GROUP BY object_name(object_id);
If you run this script, notice in the output that row compression reduced the size of this small table from five pages to three, and then page compression further reduced the size from three pages to two.
SQL Server can perform two different operations to try to compress a page by using common values: column prefix compression and dictionary compression.
Column prefix compression
As the name implies, column prefix compression works on data columns in the table being compressed, but it looks only at the column values on a single page. For each column, SQL Server chooses a common prefix that can be used to reduce the storage space required for values in that column. The longest value in the column that contains that prefix is chosen as the anchor value. Each column is then stored—not as the actual data value, but as a delta from the anchor value. Suppose that you have the following character values in a column of a table to be page-compressed:
DEEM DEE FFF DEED DEE DAN
SQL Server might note that DEE is a useful common prefix, so DEED is chosen as the anchor value. Each column would be stored as the difference between its value and the anchor value. This difference is stored as a two-part value: the number of characters from the anchor to use and the additional characters to append. So DEEM is stored as <3><M>, meaning the value uses the first three characters from the common prefix and appends a single character, M, to it. DEED is stored as an empty string (but not null) to indicate it matched the prefix exactly. DEE is stored as <3>, with the second part empty, because no additional characters can be appended. The list of column values is replaced by the values shown here:
DEEM -> <3><M> DEE -> <3><> FFF -> <><FFF> DEED -> <><> DEE -> <3><> DAN -> <1><AN>
Keep in mind that the compressed row is stored in the CD record format, so the CD array value has a special encoding to indicate the value is actually NULL. If the replacement value is <><>, and the encoding doesn’t indicate NULL, the value matches the prefix exactly.
SQL Server applies the prefix detection and value replacement algorithm to every column and creates a new row called an anchor record to store the anchor values for each column. If no useful prefix can be found, the value in the anchor record is NULL, and then all the values in the column are stored just as they are.
Figure 8-14 shows an image of six rows in a table page compression, and then shows the six rows after the anchor record has been created and the substitutions have been made for the actual data values.
Figure 8-14 Before and after column prefix compression.
After prefix compression is applied to every column individually, the second phase of page compression looks at all values on the page to find duplicates in any column of any row, even if they have been encoded to reflect prefix usage. You can see in the bottom part of Figure 8-16 that two of the values occur multiple times: <3><> occurs three times and <1><AN> occurs twice. The process of detecting duplicate values is data type–agnostic, so values in completely different columns could be the same in their binary representation. For example, a 1-byte character is represented in hex as 0x54, and it would be seen as a duplicate of the 1-byte integer 84, which is also represented in hex as 0x54. The dictionary is stored as a set of symbols, each of which corresponds to a duplicated value on the data page. After the symbols and data values are determined, each occurrence of one of the duplicated values is replaced by the symbol. SQL Server recognizes that the value actually stored in the column is a symbol and not a data value by examining the encoding in the CD array. Values which have been replaced by symbols have a CD array value of 0xc. Figure 8-15 shows the data from Figure 8-14 after replacing the five values with symbols.
Figure 8-15 A page compressed with dictionary compression.
Not every page in a compressed table has both an anchor record for prefixes and a dictionary. If no useful prefix values are available, the page might have just a dictionary. If no values repeat often enough that replacing them with symbols saves space, the page might have just an anchor record. Of course, some pages might have neither an anchor record nor a dictionary if the data on the page has no patterns at all.
When a page is compressed, only one main structural change occurs. SQL Server adds a hidden row right after the page header (at byte offset 96, or 0x60) called the compression information (CI) record. Figure 8-16 shows the structure of the CI record.
Figure 8-16 Structure of a CI record.
The CI record doesn’t have an entry in the slot array for the page, but it’s always at the same location. Also, a bit in the page header indicates that the page is page-compressed, so SQL Server looks for the CI record. If you use DBCC PAGE to dump a page, the page header information contains a value called m_typeFlagBits. If this value is 0x80, the page is compressed.
You can run the following script to use the sys.dm_db_database_page_allocations function to find the page ID (PID) of the first page and the file ID (FID) of the first page for each of the three tables that you’ve been exploring. You can use this information to examine the page with DBCC PAGE. Notice that only the page for Employees_pagecompressed has the m_typeFlagBits value set to 0x80.
USE AdventureWorks2012; GO SELECT object_name(object_id) as Table_Name, allocated_page_file_id as First_Page_FID, allocated_page_page_id as First_Page_PID FROM sys.dm_db_database_page_allocations(db_id('AdventureWorks2012'), null, null, null, 'DETAILED') WHERE object_name(object_id) like ('Employees%compressed') AND page_type_desc = 'DATA_PAGE' AND previous_page_page_id IS NULL;
Using DBCC PAGE to look at a page-compressed page does provide information about the contents of the CI record, and you’ll look at some of that information after examining what each section means.
Header The header is a 1-byte value keeping track of information about the CI. Bit 0 indicates the version, which in SQL Server 2012 is always 0. Bit 1 indicates whether the CI has an anchor record, and bit 2 indicates whether the CI has a dictionary. The rest of the bits are unused.
PageModCount The PageModCount value keeps track of the changes to this particular page and is used when determining whether the compression on the page should be reevaluated, and a new CI record built. The next section, “Page compression analysis,“ talks more about how this value is used.
Offsets The offsets contain values to help SQL Server find the dictionary. It contains a value indicating the page offset for the end of the anchor record and a value indicating the page offset for the end of the CI record itself.
Anchor Record The anchor record looks exactly like a regular CD record on the page, including the record header, the CD array, and both a short data area and a long data area. The values stored in the data area are the common prefix values for each column, some of which might be NULL.
Dictionary The dictionary area is composed of three sections.
A 2-byte field containing a numeric value representing the number of entries in the dictionary
An offset array of 2-byte entries, indicating the end offset of each dictionary entry relative to the start of the dictionary data section
The actual dictionary data entries
Remember that each dictionary entry is a byte string that is replaced in the regular data rows by a symbol. The symbol is simply an integer value from 0 to N. Also, remember that the byte strings are data type–independent—that is, they are just bytes. After SQL Server determines what recurring values are stored in the dictionary, it sorts the list first by data length, then by data value, and then assigns the symbols in order. So suppose that the values to be stored in the dictionary are these:
0x 53 51 4C 0x FF F8 0x DA 15 43 77 64 0x 34 F3 B6 22 CD 0x 12 34 56
Table 8-9 shows the sorted dictionary, along with the length and symbol for each entry.
Table 8-9 Values in a page compression dictionary
0x FF F8
0x 12 34 56
0x 53 51 4C
0x 34 F3 B6 22 CD
0x DA 15 43 77 64
The dictionary area would then resemble Figure 8-17.
Figure 8-17 Dictionary area in a compression information record.
Note that the dictionary never actually stores the symbol values. They are stored only in the data records that need to use the dictionary. Because they are simply integers, they can be used as an index into the offset list to find the appropriate dictionary replacement value. For example, if a row on the page contains the dictionary symbol , SQL Server looks in the offset list for the third entry, which in Figure 8-17 ends at offset 0800 from the start of the dictionary. SQL Server then finds the value that ends at that byte, which is 0x 53 51 4C. If this byte string was stored in a char or varchar column—that is, a single-byte character string—it would correspond to the character string SQL.
Earlier in this chapter, you saw that the DBCC PAGE output displays the CD array for compressed rows. For compressed pages, DBCC PAGE shows the CI record and details about the anchor record within it. Also, with format 3, DBCC PAGE shows details about the dictionary entries. When I captured the DBCC PAGE in format 3 for the first page of my Employees_pagecompressed table and copied it to a Microsoft Office Word document, it needed 384 pages. Needless to say, I won’t show you all that output (just copying the CI record information required 10 pages, which is still too much to show in this book). You can explore the output of DBCC PAGE for the tables with compressed pages on your own.
Page compression analysis
This section covers some of the details regarding how SQL Server determines whether to compress a page and what values it uses for the anchor record and the dictionary. Row compression is always performed when requested, but page compression depends on the amount of space that can be saved. However, the actual work of compressing the rows has to wait until after page compression is performed. Because both types of page compression—prefix substitution and dictionary symbol substitution—replace the actual data values with encodings, the row can’t be compressed until SQL Server determines what encodings will replace the actual data.
When page compression is first enabled for a table or partition, SQL Server goes through every full page to determine the possible space savings. (Any pages that aren’t full aren’t considered for compression.) This compression analysis actually creates the anchor record, modifies all the columns to reflect the anchor values, and generates the dictionary. Then it compresses each row. If the new compressed page can hold at least five more rows, or 25 percent more rows than the current page (whichever is larger), the compressed page replaces the uncompressed page. If compressing the page doesn’t result in this much savings, the compressed page is discarded.
When determining what values to use for the anchor record on a compressed page, SQL Server needs to look at every byte in every row, one column at a time. As it scans the column, it also keeps track of possible dictionary entries that can be used in multiple columns. The anchor record values can be determined for each column in a single pass—that is, by the time all the bytes in all the rows for the first column are examined once, SQL Server has determined the anchor record value for that column or has determined that no anchor record value will save sufficient space.
As SQL Server examines each column, it collects a list of possible dictionary entries. As discussed earlier, the dictionary contains values that occur enough times on the page so that replacing them with a symbol is cost-effective in terms of space. For each possible dictionary entry, SQL Server keeps track of the value, its size, and the count of occurrences. If (size_of_data_value –1) * (count–1) –2 is greater than zero, it means the dictionary replacement saves space, and the value is considered eligible for the dictionary. Because the dictionary symbols are single-byte integers, SQL Server tries can’t store more than 255 entries in the dictionary on any page, so if more dictionary entries might be used based on the data on the page, they are sorted by number of occurrences during the analysis, and only the most frequently occurring values are used in the dictionary.
CI record rebuilding
If a table is enabled for either page or row compression, new rows are always compressed before they are inserted into the table. However, the CI record containing the anchor record and the dictionary is rebuilt on an all-or-nothing basis—that is, SQL Server doesn’t just add some new entry to the dictionary when new rows are inserted. SQL Server evaluates whether to rebuild the CI record when the page has been changed a sufficient number of times. It keeps track of changes to each page in the PageModCount field of the CI record, and that value is updated every time a row is inserted, updated, or deleted.
If a full page is encountered during a data modification operation, SQL Server examines the PageModCount value. If the PageModCount value is greater than 25 or the value PageModCount/ <number of rows on the page> is greater than 25 percent, SQL Server applies the compression analysis as it does when it first compresses a page. Only when recompressing the page makes room for at least five more rows (or 25 percent more rows than the current page) does the new compressed page replace the old page.
Page compression in a B-tree and page compression in a heap each have important differences.
Compression of B-tree pages For B-trees, only the leaf level is page compressed. When inserting a new row into a B-tree, if the compressed row fits on the page, it is inserted, and nothing more is done. If it doesn’t fit, SQL Server tries to recompress the page, according to the conditions described in the preceding section. A successful recompression means that the CI record changed, so the new row must be recompressed and then SQL Server tries to insert it into the page. Again, if it fits, it is simply inserted; if the new compressed row doesn’t fit on the page, even after possibly recompressing the page, the page needs to be split. When splitting a compressed page, the CI record is copied to a new page exactly as is, except that the PageModCount value is set to 25. This means that the first time the page gets full, it gets a full analysis to determine whether it should be recompressed. B-tree pages are also checked for possible recompression during index rebuilds (either online or offline) and during shrink operations.
Compression of heap pages Pages in a heap are checked for possible compression only during rebuild and shrink operations. Also, if you drop a clustered index on a table so that it becomes a heap, SQL Server runs compression analysis on any full pages. To make sure that the RowID values stay the same, heaps aren’t recompressed during typical data modification operations. Although the PageModCount value is maintained, SQL Server never tries to recompress a page based on the PageModCount value.
An enormous amount of metadata information relating to data compression doesn’t exist. The catalog view sys.partitions has a data_compression column and a data_compression_desc column. The data_compression column has possible values of 0, 1, 2, and 3 corresponding to data_compression_desc values of NONE, ROW, PAGE, and COLUMNSTORE. (Only ROW and PAGE compression are discussed here.) Keep in mind that although row compression is always performed if enabled, page compression isn’t. Even if sys.partitions indicates that a table or partition is page compressed, that just means that page compression is enabled. Each page is analyzed individually, and if a page isn’t full, or if compression won’t save enough space, the page isn’t compressed.
You can also inspect the dynamic management function sys.dm_db_index_operational_stats. This table-valued function returns the following compression-related columns.
page_compression_attempt_count The number of pages evaluated for PAGE-level compression for specific partitions of a table, index, or indexed view. This includes pages that weren’t compressed because significant savings couldn’t be achieved.
page_compression_success_count The number of data pages that were compressed by using PAGE compression for specific partitions of a table, index, or indexed view.
SQL Server also provides a stored procedure called sp_estimate_data_compression_savings, which can give you some idea of whether compression provides a large space savings. This procedure samples up to 5,000 pages of the table and creates an equivalent table with the sampled pages in tempdb. Using this temporary table, SQL Server can estimate the new table size for the requested compression state (NONE, ROW, or PAGE). Compression can be evaluated for whole tables or parts of tables, including heaps, clustered indexes, nonclustered indexes, indexed views, and table and index partitions.
Keep in mind that the result is only an estimate and your actual savings can vary widely based on the fill factor and the size of the rows. If the procedure indicates that you can reduce your row size by 40 percent, you might not actually get a 40 percent space savings for the whole table. For example, if you have a row that’s 8,000 bytes long and you reduce its size by 40 percent, you still can fit only one row on a data page, and your table still needs the same number of pages.
Running sp_estimate_data_compression_savings might yield results that indicate that the table will grow. This can happen when many rows in the table use almost the whole maximum size of the data types, and the addition of the overhead needed for the compression information is more than the savings from compression.
If the table is already compressed, you can use this procedure to estimate the size of the table (or index) if it were to be uncompressed.
The main motivation for compressing your data is to save space with extremely large tables, such as data warehouse fact tables. A second goal is to increase performance when scanning a table for reporting purposes, because far fewer pages need to be read. Keep in mind that compression comes at a cost: You see a tradeoff between the space savings and the extra CPU overhead to compress the data for storage and then uncompress the data when it needs to be used. On a CPU-bound system, you might find that compressing your data can actually slow down your system considerably.
Page compression provides the most benefit for I/O-bound systems, with tables for which the data is written once and then read repeatedly, as in the situations mentioned in the preceding paragraph: data warehousing and reporting. For environments with heavy read and write activity, such as online transaction processing (OLTP) applications, you might want to consider enabling row compression only and avoid the costs of analyzing the pages and rebuilding the CI record. In this case, the CPU overhead is minimal. In fact, row compression is highly optimized so that it’s visible only at the storage engine layer. The relational engine (query processor) doesn’t need to deal with compressed rows at all. The relational engine sends uncompressed rows to the storage engine, which compresses them if required. When returning rows to the relational engine, the storage engine waits as long as it can before uncompressing them. In the storage engine, comparisons can be done on compressed data, as internal conversions can convert a data type to its compressed form before comparing to data in the table. Also, only columns requested by the relational engine need to be uncompressed, as opposed to uncompressing an entire row.
Compression and logging In general, SQL Server logs only uncompressed data because the log needs to be read in an uncompressed format. This means that logging changes to compressed records has a greater performance impact because each row needs to be uncompressed and decoded (from the anchor record and dictionary) before writing to the log. This is another reason compression gives you more benefit on primarily read-only systems, where logging is minimal.
SQL Server writes compressed data to the log in a few situations. The most common situation is when a page is split. SQL Server writes the compressed rows as it logs the data movement during the split operation.
Compression and the version store Chapter 13 covers the version store during a discussion about Snapshot isolation, but I want to mention briefly here how the version store interacts with compression. SQL Server can write compressed rows to the version store, and the version store processing can traverse older versions in their compressed form. However, the version store doesn’t support page compression, so the rows in the version store can’t contain encodings of the anchor record prefixes and the page dictionary. So anytime any row from a compressed page needs to be versioned, the page must be uncompressed first.
The version store is used for both varieties of Snapshot isolation (full snapshot and read-committed snapshot) and is also used for storing the before-and-after images of changed data when triggers are fired. (These images are visible in the logical tables inserted and deleted.) Keep this in mind when evaluating the costs of compression. Snapshot isolation has lots of overhead already, and adding page compression into the mix affects performance even more.
Chapter 1, “SQL Server 2012 architecture and configuration,” briefly mentioned backup compression when discussing configuration options. It’s worth repeating that the algorithm used for compressing backups is very different than the database compression algorithms discussed in this chapter. Backup compression uses an algorithm very similar to zipping, where it’s just looking for patterns in the data. Even after tables and indexes are compressed by using the data compression techniques, they still can be compressed further by using the backup compression algorithms.
Page compression looks only for prefix patterns and can still leave other patterns uncompressed, including common suffixes. Page compression eliminates redundant strings, but in most cases plenty of strings aren’t redundant, and string data compresses very well using zip-type algorithms.
Also, a fair amount of space in a database constitutes overhead, such as unallocated slots on pages and unallocated pages in allocated extents. Depending on whether Instant File Initialization was used, and what was on the disk previously if it was, the background data can actually compress very well.
Thus, making a compressed backup of a database that has many compressed tables and indexes can provide additional space savings for the backup set.