Microsoft SQL Server 2012 Internals: Special Storage

  • 11/15/2013
This chapter from Microsoft SQL Server 2012 Internals looks at how SQL Server stores data that doesn’t use the typical FixedVar record format and data that doesn’t fit into the usual 8 KB data page.

Kalen Delaney

Earlier chapters discussed the storage of “regular rows” for both data and index information. (Chapter 7, “Indexes: internals and management,” also looked at a completely different way of storing indexes: using columnstores, which aren’t stored in rows at all.) Chapter 6, “Table storage,” explained that regular rows are in a format called FixedVar. SQL Server provides ways of storing data in another format, called Column Descriptor (CD). It also can store special values in either the FixedVar or CD format that don’t fit on the regular 8 KB pages.

This chapter describes data that exceeds the typical row size limitations and is stored as either row-overflow or Large Object (LOB) data. You’ll learn about two additional methods for storing data on the actual data pages, introduced in Microsoft SQL Server 2008: one that uses a new type of complex column with a regular data row (sparse columns), and one that uses the new CD format (compressed data). This chapter also discusses FILESTREAM data, a feature introduced in SQL Server 2008 that allows you to access data from operating system files as though it were part of your relational tables, and FileTables, a new feature in SQL Server 2012 that allows you to create a table containing both FILESTREAM data and Windows file attribute metadata.

Finally, this chapter covers the ability of SQL Server to separate data into partitions. Although this doesn’t change the data format in the rows or on the pages, it does change the metadata that keeps track of what space is allocated to which objects.

Large object storage

SQL Server 2012 has two special formats for storing data that doesn’t fit on the regular 8 KB data page. These formats allow you to store rows that exceed the maximum row size of 8,060 bytes. As discussed in Chapter 6, this maximum row size value includes several bytes of overhead stored with the row on the physical pages, so the total size of all the table’s defined columns must be slightly less than this amount. In fact, the error message that you get if you try to create a table with more bytes than the allowable maximum is very specific. If you execute the following CREATE TABLE statement with column definitions that add up to exactly 8,060 bytes, you’ll get the error message shown here:

USE testdb;
GO
CREATE TABLE dbo.bigrows_fixed
(   a char(3000),
    b char(3000),
    c char(2000),
    d char(60) ) ;
Msg 1701, Level 16, State 1, Line 1
Creating or altering table 'bigrows' failed because the minimum row size would be 8067,
including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of
8060 bytes.

In this message, you can see the number of overhead bytes (7) that SQL Server wants to store with the row itself. An additional 2 bytes is used for the row-offset information at the end of the page, but those bytes aren’t included in this total.

Restricted-length large object data (row-overflow data)

One way to exceed this size limit of 8,060 bytes is to use variable-length columns because for variable-length data, SQL Server 2005 and later versions can store the columns in special row-overflow pages, as long as all the fixed-length columns fit into the regular in-row size limit. So you need to look at a table with all variable-length columns. Notice that although my example uses all varchar columns, columns of other data types can also be stored on row-overflow data pages. These other data types include varbinary, nvarchar, and sqlvariant columns, as well as columns that use CLR user-defined data types. The following code creates a table with rows whose maximum defined length is much longer than 8,060 bytes:

USE testdb;
CREATE TABLE dbo.bigrows
  (a varchar(3000),
   b varchar(3000),
   c varchar(3000),
   d varchar(3000) );

In fact, if you ran this CREATE TABLE statement in SQL Server 7.0, you would get an error, and the table wouldn’t be created. In SQL Server 2000, the table was created, but you got a warning that inserts or updates might fail if the row size exceeds the maximum.

With SQL Server 2005 and later, not only could the preceding dbo.bigrows table be created, but you also could insert a row with column sizes that add up to more than 8,060 bytes with a simple INSERT:

INSERT INTO dbo.bigrows
     SELECT REPLICATE('e', 2100), REPLICATE('f', 2100),
      REPLICATE('g', 2100),  REPLICATE('h', 2100);

To determine whether SQL Server is storing any data in row-overflow data pages for a particular table, you can run the following allocation query from Chapter 5, “Logging and recovery”:

SELECT object_name(object_id) AS name,
    partition_id, partition_number AS pnum,  rows,
    allocation_unit_id AS au_id, type_desc as page_type_desc,
    total_pages AS pages
FROM sys.partitions p  JOIN sys.allocation_units a
   ON p.partition_id = a.container_id
WHERE object_id=object_id('dbo.bigrows');

This query should return output similar to that shown here:

name    partition_id      pnum rows au_id             page_type_desc     pages
----    ----------------- ---- ---- ----------------- ----------------   -----
bigrows 72057594039238656 1    1    72057594043957248 IN_ROW_DATA         2
bigrows 72057594039238656 1    1    72057594044022784 ROW_OVERFLOW_DATA   2

You can see that there are two pages for the one row of regular in-row data and two pages for the one row of row-overflow data. Alternatively, you can use the sys.dm_db_database_page_allocations function and see the four pages individually:

SELECT allocated_page_file_id as PageFID, allocated_page_page_id as PagePID,
       object_id as ObjectID, partition_id AS PartitionID,
       allocation_unit_type_desc as AU_type, page_type as PageType
FROM sys.dm_db_database_page_allocations
       (db_id('testdb'), object_id('bigrows'), null, null, 'DETAILED');

You should see the four rows, one for each page, looking similar to the following:

PageFID PagePID     ObjectID    PartitionID AU_type             PageType
------- ----------- ----------- ----------- ------------------- -----------
1       303         1653580929  1           IN_ROW_DATA         10
1       302         1653580929  1           IN_ROW_DATA         1
1       297         1653580929  1           ROW_OVERFLOW_DATA   10
1       296         1653580929  1           ROW_OVERFLOW_DATA   3

Of course, your actual ID values will be different, but the AU-type and PageType values should be the same, and you should have four rows returned indicating four pages belong to the bigrows table. Two pages are for the row-overflow data, and two are for the in-row data. As you saw in Chapter 7, the PageType values have the following meanings.

  • PageType = 1, Data page

  • PageType = 2, Index page

  • PageType = 3, LOB or row-overflow page, TEXT_MIXED

  • PageType = 4, LOB or row-overflow page, TEXT_DATA

  • PageType = 10, IAM page

You learn more about the different types of LOB pages in the next section, “Unrestricted-length large object data.”

You can see one data page and one IAM page for the in-row data, and one data page and one IAM page for the row-overflow data. With the results from sys.dm_db_database_page_allocations, you could then look at the page contents with DBCC PAGE. On the data page for the in-row data, you would see three of the four varchar column values, and the fourth column would be stored on the data page for the row-overflow data. If you run DBCC PAGE for the data page storing the in-row data (page 1:302 in the preceding output), notice that it isn’t necessarily the fourth column in the column order that is stored off the row. (I won’t show you the entire contents of the rows because the single row fills almost the entire page.) Look at the in-row data page using DBCC PAGE and notice the column with e, the column with g, and the column with h. The column with f has moved to the new row. In the place of that column, you can see the bytes shown here:

65020000 00010000 00c37f00 00340800 00280100 00010000 0067

Included are the last byte with e (ASCII code hexadecimal 65) and the first byte with g (ASCII code hexadecimal 67), and in between are 24 other bytes (boldfaced). Bytes 16 through 23 (the 17th through the 24th bytes) of those 24 bytes are treated as an 8-byte numeric value: 2801000001000000 (bold italic). You need to reverse the byte order and break it into a 2-byte hex value for the slot number, a 2-byte hex value for the file number, and a 4-byte hex value for the page number. So the slot number is 0x0000 for slot 0 because this overflowing column is the first (and only) data on the row-overflow page. You have 0x0001 (or 1) for the file number and 0x00000128 (or 296) for the page number. You saw these the same file and page numbers when using sys.dm_db_database_page_allocations.

Table 8-1 describes the first 16 bytes in the row.

Table 8-1 The first 16 bytes of a row-overflow pointer

Bytes

Hex value

Decimal value

Meaning

0

0x02

2

Type of special field: 1 = LOB2 = overflow

1–2

0x0000

0

Level in the B-tree (always 0 for overflow)

3

0x00

0

Unused

4–7

0x00000001

1

Sequence: a value used by optimistic concurrency control for cursors that increases every time a LOB or overflow column is updated

8–11

0x00007fc3

32707

Timestamp: a random value used by DBCC CHECKTABLE that remains unchanged during the lifetime of each LOB or overflow column

12–15

0x00000834

2100

Length

SQL Server stores variable-length columns on row-overflow pages only under certain conditions. The determining factor is the row length itself. How full the regular page is into which SQL Server is trying to insert the new row doesn’t matter; SQL Server constructs the row as usual and stores some of its columns on overflow pages only if the row itself needs more than 8,060 bytes.

Each column in the table is either completely on the row or completely off the row. This means that a 4,000-byte variable-length column can’t have half its bytes on the regular data page and half on a row-overflow page. If a row is less than 8,060 bytes and the page on which SQL Server is trying to insert the row has no room, regular page-splitting algorithms (described in Chapter 7) are applied.

One row can span many row-overflow pages if it contains many large variable-length columns. For example, you can create the table dbo.hugerows and insert a single row into it as follows:

CREATE TABLE dbo.hugerows
  (a varchar(3000),
   b varchar(8000),
   c varchar(8000),
   d varchar(8000));
INSERT INTO dbo.hugerows
     SELECT REPLICATE('a', 3000), REPLICATE('b', 8000),
         REPLICATE('c', 8000),  REPLICATE('d', 8000);

Substituting hugerows for bigrows for the allocation query shown earlier yields the following results:

name     partition_id       pnum rows au_id              page_type_desc     pages
-------- -----------------  ---- ---- -----------------  -----------------  -----
hugerows 72057594039304192  1    1    72057594044088320  IN_ROW_DATA        2
hugerows 72057594039304192  1    1    72057594044153856  ROW_OVERFLOW_DATA  4

The output shows four pages for the row-overflow information, one for the row-overflow IAM page, and three for the columns that didn’t fit in the regular row. The number of large variable-length columns that a table can have isn’t unlimited, although it is quite large. A table is limited to 1,024 columns, which can be exceeded when you are using sparse columns, as discussed later in this chapter. However, another limit is reached before that. When a column must be moved off a regular page onto a row-overflow page, SQL Server keeps a pointer to the row-overflow information as part of the original row, which you saw in the DBCC output earlier as 24 bytes, and the row still needs 2 bytes in the column-offset array for each variable-length column, whether or not the variable-length column is stored in the row. So 308 turns out to be the maximum number of overflowing columns you can have, and such a row needs 8,008 bytes just for the 26 overhead bytes for each overflowing column in the row.

In some cases, if a large variable-length column shrinks, it can be moved back to the regular row. However, for efficiency, if the decrease is just a few bytes, SQL Server doesn’t bother checking. Only when a column stored in a row-overflow page is reduced by more than 1,000 bytes does SQL Server even consider checking to see whether the column can now fit on the regular data page. You can observe this behavior if you previously created the dbo.bigrows table for the earlier example and inserted only the one row with 2,100 characters in each column.

The following update reduces the size of the first column by 500 bytes and reduces the row size to 7,900 bytes, which should all fit on one data page:

UPDATE bigrows
SET a = replicate('a', 1600);

However, if you rerun the allocation query, you’ll still see two row-overflow pages: one for the row-overflow data and one for the IAM page. Now reduce the size of the first column by more than 1,000 bytes and rerun the allocation query:

UPDATE bigrows
SET a = 'aaaaa';

You should see only three pages for the table now, because there is no longer a row-overflow data page. The IAM page for the row-overflow data pages hasn’t been removed, but you no longer have a data page for row-overflow data.

Keep in mind that row-overflow data storage applies only to columns of variable-length data, which are defined as no longer than the usual variable-length maximum of 8,000 bytes per column. Also, to store a variable-length column on a row-overflow page, you must meet the following conditions.

  • All the fixed-length columns, including overhead bytes, must add up to no more than 8,060 bytes. (The pointer to each row-overflow column adds 24 bytes of overhead to the row.)

  • The actual length of the variable-length column must be more than 24 bytes.

  • The column must not be part of the clustered index key.

If you have single columns that might need to store more than 8,000 bytes, you should use either LOB (text, image, or ntext) columns or the MAX data types.

Unrestricted-length large object data

If a table contains the deprecated LOB data types (text, ntext, or image types), by default the actual data isn’t stored on the regular data pages. Like row-overflow data, LOB data is stored in its own set of pages, and the allocation query shows you pages for LOB data as well as pages for regular in-row data and row-overflow data. For LOB columns, SQL Server stores a 16-byte pointer in the data row that indicates where the actual data can be found. Although the default behavior is to store all the LOB data off the data row, SQL Server allows you to change the storage mechanism by setting a table option to allow LOB data to be stored in the data row itself if it is small enough. Note that no database or server setting is available to control storing small LOB columns on the data pages; it’s managed as a table option.

The 16-byte pointer points to a page (or the first of a set of pages) where the data can be found. These pages are 8 KB in size, like any other page in SQL Server, and individual text, ntext, and image pages aren’t limited to storing data for only one occurrence of a text, ntext, or image column. A text, ntext, or image page can hold data from multiple columns and from multiple rows; the page can even have a mix of text, ntext, and image data. However, one text or image page can hold only text or image data from a single table. (Even more specifically, one text or image page can hold only text or image data from a single partition of a table, which should become clear when partitioning metadata is discussed at the end of this chapter.)

The collection of 8 KB pages that make up a LOB column aren’t necessarily located next to each other. The pages are logically organized in a B-tree structure, so operations starting in the middle of the LOB string are very efficient. The structure of the B-tree varies slightly depending on whether the amount of data is less than or more than 32 KB. (See Figure 8-1 for the general structure.) B-trees were discussed in detail when describing indexes in Chapter 7.

Figure 8-1

Figure 8-1 A text column pointing to a B-tree that contains the blocks of data.

If the amount of LOB data is less than 32 KB, the text pointer in the data row points to an 84-byte text root structure. This forms the root node of the B-tree structure. The root node points to the blocks of text, ntext, or image data. Although the data for LOB columns is arranged logically in a B-tree, both the root node and the individual blocks of data are spread physically throughout LOB pages for the table. They’re placed wherever space is available. The size of each block of data is determined by the size written by an application. Small blocks of data are combined to fill a page. If the amount of data is less than 64 bytes, it’s all stored in the root structure.

If the amount of data for one occurrence of a LOB column exceeds 32 KB, SQL Server starts building intermediate nodes between the data blocks and the root node. The root structure and the data blocks are interleaved throughout the text and image pages. The intermediate nodes, however, are stored in pages that aren’t shared between occurrences of text or image columns. Each page storing intermediate nodes contains only intermediate nodes for one text or image column in one data row.

SQL Server can store the LOB root and the actual LOB data on two different types of pages. One of these, referred to as TEXT_MIXED, allows LOB data from multiple rows to share the same pages. However, when your text data gets larger than about 40 KB, SQL Server starts devoting whole pages to a single LOB value. These pages are referred to as TEXT_DATA pages.

You can see this behavior by creating a table with a text column, inserting a value of less than 40 KB and then one greater than 40 KB, and finally examining information returned by sys.dm_db_database_page_allocations (see Listing 8-1).

Listing 8-1 Storing LOB data on two types of pages

IF OBJECT_ID('textdata') IS NOT NULL
    DROP TABLE textdata;
GO
CREATE TABLE textdata
 (bigcol text);
GO
INSERT INTO textdata
   SELECT REPLICATE(convert(varchar(MAX), 'a'), 38000);
GO
SELECT allocated_page_file_id as PageFID, allocated_page_page_id as PagePID,
       object_id as ObjectID, partition_id AS PartitionID,
       allocation_unit_type_desc as AU_type, page_type as PageType
FROM sys.dm_db_database_page_allocations(db_id('testdb'), object_id('textdata'),
                                          null, null, 'DETAILED');
GO
INSERT INTO textdata
   SELECT REPLICATE(convert(varchar(MAX), 'a'), 41000);
GO
SELECT allocated_page_file_id as PageFID, allocated_page_page_id as PagePID,
       object_id as ObjectID, partition_id AS PartitionID,
       allocation_unit_type_desc as AU_type, page_type as PageType
FROM sys.dm_db_database_page_allocations(db_id('testdb'), object_id('textdata'),
                                         null, null, 'DETAILED');
GO

The INSERT statements in Listing 8-1 convert a string value into the data type varchar(MAX) because this is the only way to generate a string value longer than 8,000 bytes. (The next section discusses varchar(MAX) in more detail.) The first time you select from sys.dm_db_database_page_allocations, you should have PageType values of 1, 3, and 10. The second time after data greater than 40 KB in size is inserted, you should also see PageType values of 4. PageType 3 indicates a TEXT_MIXED page, and PageType 4 indicates a TEXT_DATA page.

Storing LOB data in the data row

If you store all your LOB data type values outside your regular data pages, SQL Server needs to perform additional page reads every time you access that data, just as it does for row-overflow pages. In some cases, you might notice a performance improvement by allowing some of the LOB data to be stored in the data row. You can enable a table option called text in row for a particular table by setting the option to ‘ON’ (including the single quotation marks) or by specifying a maximum number of bytes to be stored in the data row. The following command enables up to 500 bytes of LOB data to be stored with the regular row data in a table called employee:

EXEC sp_tableoption employee, 'text in row', 500;

Notice that the value is in bytes, not characters. For ntext data, each character needs 2 bytes so that any ntext column is stored in the data row if it’s less than or equal to 250 characters. When you enable the text in row option, you never get just the 16-byte pointer for the LOB data in the row, as is the case when the option isn’t ‘ON’. If the data in the LOB field is more than the specified maximum, the row holds the root structure containing pointers to the separate chunks of LOB data. The minimum size of a root structure is 24 bytes, and the possible range of values that text in row can be set to is 24 to 7,000 bytes. (If you specify the option ‘ON’ instead of a specific number, SQL Server assumes the default value of 256 bytes.)

To disable the text in row option, you can set the value to either ‘OFF’ or 0. To determine whether a table has the text in row property enabled, you can inspect the sys.tables catalog view as follows:

SELECT name, text_in_row_limit
FROM sys.tables
WHERE name = 'employee';

This text_in_row_limit value indicates the maximum number of bytes allowed for storing LOB data in a data row. If a 0 is returned, the text in row option is disabled.

Now create a table very similar to the one that looks at row structures, but change the varchar(250) column to the text data type. You’ll use almost the same INSERT statement to insert one row into the table:

CREATE TABLE HasText
(
Col1 char(3)       NOT NULL,
Col2 varchar(5)    NOT NULL,
Col3 text          NOT NULL,
Col4 varchar(20)   NOT NULL
);
INSERT HasText VALUES
    ('AAA', 'BBB', REPLICATE('X', 250), 'CCC');

Now use the allocation query to find the basic information for this table and look at the sys.dm_db_database_page_allocations information for this table (see Listing 8-2).

Listing 8-2 Finding basic information for the HasText table

SELECT convert(char(7), object_name(object_id))  AS name,
     partition_id, partition_number AS pnum,  rows,
     allocation_unit_id AS au_id, convert(char(17), type_desc) as page_type_desc,
    total_pages AS pages
FROM sys.partitions p  JOIN sys.allocation_units a
   ON p.partition_id = a.container_id
WHERE object_id=object_id('dbo.HasText');
SELECT allocated_page_file_id as PageFID, allocated_page_page_id as PagePID,
       object_id as ObjectID, partition_id AS PartitionID, allocation_unit_type_desc as AU_Type,
       page_type as PageType
FROM sys.dm_db_database_page_allocations(db_id('testdb'),
       object_id('textdata'), null, null, 'DETAILED')
name    partition_id      pnum rows  au_id             page_type_desc  pages
------- ----------------- ---- ----- ----------------- --------------- -----
HasText 72057594039435264 1    1     72057594044350464 IN_ROW_DATA     2
HasText 72057594039435264 1    1     72057594044416000 LOB_DATA        2
PageFID  PagePID  ObjectID   PartitionID         AU_Type          PageType
-------  -------  --------   -----------------   --------------   --------
1        2197     133575514  72057594039435264   LOB data         3
1        2198     133575514  72057594039435264   LOB data         10
1        2199     133575514  72057594039435264   In-row data      1
1        2200     133575514  72057594039435264   In-row data      10

You can see two LOB pages (the LOB data page and the LOB IAM page) and two pages for the in-row data (again, the data page and the IAM page). The data page for the in-row data is 2199, and the LOB data is on page 2197. The following output shows the data section from running DBCC PAGE on page 2199. The row structure is very similar to the row structure shown in Chapter 6, in Figure 6-6, except for the text field itself. Bytes 21 to 36 are the 16-byte text pointer, and you can see the value 9508 starting at offset 29. When the bytes are reversed, it becomes 0x0895, or 2197 decimal, which is the page containing the text data, as you saw in the output in Listing 8-2.

Figure 8-2

Figure 8-2 A row containing a text pointer.

Now let’s enable text data in the row, for up to 500 bytes:

EXEC sp_tableoption HasText, 'text in row', 500;

Enabling this option doesn’t force the text data to be moved into the row. You have to update the text value to actually force the data movement:

UPDATE HasText
SET col3 = REPLICATE('Z', 250);

If you run DBCC PAGE on the original data page, notice that the text column of 250 z’s is now in the data row, and that the row is practically identical in structure to the row containing varchar data that you saw in Figure 6-6.

A final issue when working with LOB data and the text in row option is dealing with the situation in which text in row is enabled but the LOB is longer than the maximum configured length for some rows. If you change the maximum length for text in row to 50 for the HasText table you’ve been working with, this also forces the LOB data for all rows with more than 50 bytes of LOB data to be moved off the page immediately, just as when you disable the option completely:

EXEC sp_tableoption HasText, 'text in row', 50;

However, setting the limit to a smaller value is different from disabling the option in two ways. First, some of the rows might still have LOB data that is under the limit, and for those rows, the LOB data is stored completely in the data row. Second, if the LOB data doesn’t fit, the information stored in the data row itself isn’t simply the 16-byte pointer, as it would be if text in row were turned off. Instead, for LOB data that doesn’t fit in the defined size, the row contains a root structure for a B-tree that points to chunks of the LOB data. As long as the text in row option isn’t ‘OFF’ (or 0), SQL Server never stores the simple 16-byte LOB pointer in the row. It stores either the LOB data itself (if it fits) or the root structure for the LOB data B-tree.

A root structure is at least 24 bytes long (which is why 24 is the minimum size for the text in row limit), and the meaning of the bytes is similar to the meaning of the 24 bytes in the row-overflow pointer. The main difference is that no length is stored in bytes 12–15. Instead, bytes 12–23 constitute a link to a chunk of LOB data on a separate page. If multiple LOB chucks are accessed via the root, multiple sets of 12 bytes can be here, each pointing to LOB data on a separate page.

As indicated earlier, when you first enable text in row, no data movement occurs until the text data is actually updated. The same is true if the limit is increased—that is, even if the new limit is large enough to accommodate the LOB data that was stored outside the row, the LOB data isn’t moved onto the row automatically. You must update the actual LOB data first.

Keep in mind that even if the amount of LOB data is less than the limit, the data isn’t necessarily stored in the row. You’re still limited to a maximum row size of 8,060 bytes for a single row on a data page, so the amount of LOB data that can be stored in the actual data row might be reduced if the amount of non-LOB data is large. Also, if a variable-length column needs to grow, it might push LOB data off the page so as not to exceed the 8,060-byte limit. Growth of variable-length columns always has priority over storing LOB data in the row. If no variable-length char fields need to grow during an update operation, SQL Server checks for growth of in-row LOB data, in column offset order. If one LOB needs to grow, others might be pushed off the row.

Finally, you should be aware that SQL Server logs all movement of LOB data, which means that reducing the limit of or turning ‘OFF’ the text in row option can be a very time-consuming operation for a large table.

Although large data columns using the LOB data types can be stored and managed very efficiently, using them in your tables can be problematic. Data stored as text, ntext, or image can’t always be manipulated with the usual data-manipulation commands and, in many cases, you need to resort to using the operations readtext, writetext, and updatetext, which require dealing with byte offsets and data-length values. Prior to SQL Server 2005, you had to decide whether to limit your columns to a maximum of 8,000 bytes or to deal with your large data columns by using different operators than you used for your shorter columns. Starting with version 2005, SQL Server provides a solution that gives you the best of both worlds, as you’ll see in the next section.

Storing MAX-length data

SQL Server 2005 and later versions give you the option of defining a variable-length field with the MAX specifier. Although this functionality is frequently described by referring only to varchar(MAX), the MAX specifier can also be used with nvarchar and varbinary. You can indicate the MAX specifier instead of an actual size when you use one of these types to define a column, variable, or parameter. By using the MAX specifier, you leave it up to SQL Server to determine whether to store the value as a regular varchar, nvarchar, or varbinary value or as a LOB. In general, if the actual length is 8,000 bytes or less, the value is treated as though it were one of the regular variable-length data types, including possibly overflowing onto row-overflow pages. However, if the varchar(MAX) column does need to spill off the page, the extra pages required are considered LOB pages and show the IAM_chain_type LOB when examined using DBCC IND. If the actual length is greater than 8,000 bytes, SQL Server stores and treats the value exactly as though it were text, ntext, or image. Because variable-length columns with the MAX specifier are treated either as regular variable-length columns or as LOB columns, no special discussion of their storage is needed.

The size of values specified with MAX can reach the maximum size supported by LOB data, which is currently 2 GB. By using the MAX specifier, however, you are indicating that the maximum size should be the maximum the system supports. If you upgrade a table with a varchar(MAX) column to a future version of SQL Server, the MAX length becomes whatever the new maximum is in the new version.

Appending data into a LOB column

In the storage engine, each LOB column is broken into fragments of a maximum size of 8,040 bytes each. When you append data to a large object, SQL Server finds the append point and looks at the current fragment where the new data will be added. It calculates the size of the new fragment (including the newly appended data). If the size is more than 8,040 bytes, SQL Server allocates new large object pages until a fragment is left that is less than 8,040 bytes, and then it finds a page that has enough space for the remaining bytes.

When SQL Server allocates pages for LOB data, it has two allocation strategies:

  • For data that is less than 64 KB in size, it randomly allocates a page. This page comes from an extent that is part of the large object IAM, but the pages aren’t guaranteed to be continuous.

  • For data that is more than 64 KB in size, it uses an append-only page allocator that allocates one extent at a time and writes the pages continuously in the extent.

From a performance standpoint, writing fragments of 64 KB at a time is beneficial. Allocating 1 MB in advance might be beneficial if you know that the size will be 1 MB, but you also need to take into account the space required for the transaction log. If you a create a 1 MB fragment first with any random contents, SQL Server logs the 1 MB, and then all the changes are logged as well. When you perform large object data updates, no new pages need to be allocated, but the changes still need to be logged.

As long as the large object values are small, they can be in the data page. In this case, some preallocation might be a good idea so that the large object data doesn’t become too fragmented. A general recommendation might be that if the amount of data to be inserted into a large object column in a single operation is relatively small, you should insert a large object value of the final expected value, and then replace substrings of that initial value as needed. For larger sizes, try to append or insert in chunks of 8 * 8,040 bytes. This way, a whole extent is allocated each time, and 8,040 bytes are stored on each page.

If you do find that your large object data is becoming fragmented, you can use ALTER INDEX REORGANIZE to defragment that data. In fact, this option (WITH LOB_COMPACTION) is on by default, so you just need to make sure that you don’t set it to ‘OFF’.