Microsoft SQL Server 2012 Internals: Special Storage

  • 11/15/2013

FILESTREAM and FileTable data

Although the flexible methods that SQL Server uses to store large object data in the database give you many advantages over data stored in the file system, they also have many disadvantages. Some of the benefits of storing large objects in your database include the following.

  • Transactional consistency of your large object data can be guaranteed.

  • Your backup and restore operations include the large object data, allowing you integrated, point-in-time recovery of your large objects.

  • All data can be stored using a single storage and query environment.

Some of the disadvantages of storing large objects in your database include the following.

  • Large objects can require a very large number of buffers in cache.

  • The upper limit on the size of any large object value is 2 GB.

  • Updating large objects can cause extensive database fragmentation.

  • Database files can become extremely large.

  • Read or write streaming operations from varchar(MAX) and varbinary(MAX) columns are significantly slower than streaming from NTFS files.

SQL Server allows you to manage file system objects as though they were part of your database to provide the benefits of having large objects in the database while minimizing the disadvantages. The data stored in the file system can be FILESTREAM or FileTable data. As you start evaluating whether FILESTREAM or FileTable data is beneficial for your applications, consider both the benefits and the drawbacks. Some benefits of both FILESTREAM and FileTable data include the following.

  • The large object data is stored in the file system but rooted in the database as a 48-byte file pointer value in the column that contains the FILESTREAM data.

  • The large object data is accessible through both Transact-SQL (T-SQL) and the NTFS streaming APIs, which can provide great performance benefits.

  • The large object size is limited only by the NTFS volume size, not the old 2 GB limit for large object data stored within a database.

FILESTREAM data has the following additional benefits.

  • The large object data is kept transactionally consistent with structured data.

  • Databases containing FILESTREAM data can participate in the SQL Server 2012 AlwaysOn availability groups.

FileTable data has these additional benefits.

  • The data is available through any Wind32 application without any modification of the application.

  • FileTables allow you to support a hierarchy of directories and files.

Some drawbacks of using FILESTREAM or FileTable data include the following.

  • Database snapshots can’t include the FILESTREAM filegroups, so the FILESTREAM data is unavailable. A SELECT statement in a database snapshot that requests a FILESTREAM column generates an error.

  • SQL Server can’t encrypt FILESTREAM data natively.

Because the FileTable feature is built on top of the FILESTREAM technology, I’ll first tell you about FILESTREAM (which was introduced in SQL Server 2008) and then about what has been added in SQL Server 2012 to enable FileTables.

Enabling FILESTREAM data for SQL Server

The capability to access FILESTREAM data must be enabled both outside and inside your SQL Server instance, as mentioned in Chapter 1 when discussing configuration. Through the SQL Server Configuration Manager, you must enable T-SQL access to FILESTREAM data, and if that has been enabled, you can also enable file I/O streaming access. If file I/O streaming access is allowed, you can allow remote clients to have access to the streaming data if you want. When the SQL Server Configuration Manager is opened, make sure that you have selected SQL Server Services in the left pane. In the right pane, right-click the SQL instance that you want to configure and select Properties from the drop-down menu. The Properties sheet has six tabs, including one labeled FILESTREAM. You can see the details of the FILESTREAM tab of the SQL Server Properties sheet in Figure 8-3.

Figure 8-3

Figure 8-3 Configuring a SQL Server instance to allow FILESTREAM access.

After the server instance is configured, you need to use sp_configure to set your SQL Server instance to the level of FILESTREAM access that you require. Three values are possible.

  • 0 (the default) means that no FILESTREAM access is allowed.

  • 1 means that you can use T-SQL to access FILESTREAM data.

  • 2 means that you can use both T-SQL and the Win32 API for FILESTREAM access.

As with all configuration options, don’t forget to run the RECONFIGURE command after changing a setting:

EXEC sp_configure 'filestream access level', 2; RECONFIGURE;

Creating a FILESTREAM-enabled database

To store FILESTREAM data, a database must have at least one filegroup that was created to allow FILESTREAM data. When creating a database, a filegroup that allows FILESTREAM data is specified differently from a filegroup containing row data in several different ways.

  • The path specified for the FILESTREAM filegroup must exist only up to the last folder name. The last folder name must not exist but is created when SQL Server creates the database.

  • The size and filegrowth properties don’t apply to FILESTREAM filegroups.

  • If no FILESTREAM-containing filegroup is specified as DEFAULT, the first FILESTREAM-containing filegroup listed is the default. (Therefore, you have one default filegroup for row data and one default filegroup for FILESTREAM data.)

Look at the following code, which creates a database called MyFilestreamDB with two FILESTREAM-containing filegroups. The path c:\Data2 must exist, but it must not contain either the filestream1 or the filestream2 folders:

CREATE DATABASE  MyFilestreamDB  ON  PRIMARY
     (NAME = N'Rowdata1', FILENAME = N'c:\data\Rowdata1.mdf' , SIZE = 2304KB ,
      MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM DEFAULT( NAME = FSData1,
    FILENAME = 'c:\Data2\FileStreamGroup1'),
FILEGROUP FileStreamGroup2 CONTAINS FILESTREAM ( NAME = FSData2,
    FILENAME = 'c:\Data2\FileStreamGroup2')
LOG ON
    (NAME = N'FSDBLOG', FILENAME = N'c:\data\FSDB_log.ldf' , SIZE = 1024KB ,
     MAXSIZE = 2048GB , FILEGROWTH = 10%);

When the preceding MyFilestreamDB database is created, SQL Server creates the two folders, FileStreamGroup1 and FileStreamGroup2, in the C:\Data2 directory. These folders are referred to as the FILESTREAM containers. Initially, each container contains an empty folder called $FSLOG and a header file called filestream.hdr. As tables are created to use FILESTREAM space in a container, a folder for each partition or each table containing FILESTREAM data is created in the container.

An existing database can be altered to have a FILESTREAM filegroup added, and then a subsequent ALTER DATABASE command can add a file to the FILESTREAM filegroup. Note that you can’t add FILESTREAM filegroups to the master, model, and tempdb databases.

Creating a table to hold FILESTREAM data

To specify that a column is to contain FILESTREAM data, it must be defined as type varbinary(MAX) with a FILESTREAM attribute. The database containing the table must have at least one filegroup defined for FILESTREAM. Your table creation statement can specify which filegroup its FILESTREAM data is stored in, and if none is specified, the default FILESTREAM filegroup is used. Finally, any table that has FILESTREAM columns must have a column of the uniqueidentifier data type with the ROWGUIDCOL attribute specified. This column must not allow NULL values and must be guaranteed to be unique by specifying either the UNIQUE or PRIMARY KEY single-column constraint. The ROWGUIDCOL column acts as a key that the FILESTREAM agent can use to locate the actual row in the table to check permissions, obtain the physical path to the file, and possibly lock the row if required.

Now look at the files that are created within the container. When created in the MyFilestreamDB database, the following table adds several folders to the FileStreamGroup1 container:

CREATE TABLE MyFilestreamDB.dbo.Records
(
        [Id] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE,
        [SerialNumber] INTEGER UNIQUE,
        [Chart_Primary] VARBINARY(MAX) FILESTREAM NULL,
        [Chart_Secondary] VARBINARY(MAX) FILESTREAM NULL)
FILESTREAM_ON FileStreamGroup1;

Because this table is created on FileStreamGroup1, the container located at C:\Data2\FileStreamGroup1 is used. One subfolder is created within FileStreamGroup1 for each table or partition created in the FileStreamGroup1 filegroup, and those filenames are GUIDs. Each file has a subfolder for each column within the table or partition which holds FILESTREAM data, and again, the names of those subfolders are GUIDs. Figure 8-4 shows the structure of the files on my disk right after the MyFilestreamDB.dbo.Records table is created. The FileStreamGroup2 folder has only the $FSLOG subfolder, and no subfolders for any tables. The FileStreamGroup1 folder has a GUID-named subfolder for the dbo.Records table and, within that, a GUID-named subfolder for each of the two FILESTREAM columns in the table. No files exist except for the original filestream.hdr file. Files aren’t added until FILESTREAM data is actually inserted into the table.

Figure 8-4

Figure 8-4 The operating system file structure after creating a table with two FILESTREAM data columns.

Manipulating FILESTREAM data

FILESTREAM data can be manipulated with either T-SQL or the Win32 API. When using T-SQL, you can process the data exactly as though it were varbinary(MAX). Using the Win32 API requires that you first obtain the file path and current transaction context. You can then open a WIN32 handle and use it to read and write the large object data. All the examples in this section use T-SQL; you can get the details of Win32 manipulation from SQL Server Books Online.

As you add data to the table, files are added to the subfolders for each column. INSERT operations that fail with a runtime error (for example, due to a uniqueness violation) still create a file for each FILESTREAM column in the row. Although the row is never accessible, it still uses file system space.

Inserting FILESTREAM data

You can insert data by using regular T-SQL INSERT statements. You must insert FILESTREAM data by using the varbinary(MAX) data type but can convert any string data in the INSERT statement. The following statement adds one row to the dbo.Records table, created earlier with two FILESTREAM columns. The first FILESTREAM column gets a 90,000-byte character string converted to varbinary(MAX), and the second FILESTREAM column gets an empty binary string.

USE MyFileStreamDB
INSERT INTO dbo.Records
    SELECT newid (), 24,
      CAST (REPLICATE (CONVERT(varchar(MAX), 'Base Data'), 10000)
                         AS varbinary(max)),
      0x;

First, the nine-character string Base Data to varchar(MAX) is converted because a regular string value can’t be more than 8,000 bytes. The REPLICATE function returns the same data type as its first parameter, so that first parameter should be unambiguously a large object. Replicating the 9-byte string 10,000 times results in a 90,000-byte string, which is then converted to varbinary(MAX). Notice that a value of 0x is an empty binary string, which isn’t the same as a NULL. Every row that has a non-NULL value in a FILESTREAM column has a file, even for zero-length values.

Figure 8-5 shows what your file system should look like after running the preceding code to create a database with two FILESTREAM containers and create a table with two FILESTREAM columns, and then inserting one row into that table. In the left pane, you can see the two FILESTREAM containers, FileStreamGroup1 and FileStreamGroup2.

Figure 8-5

Figure 8-5 The operating system file structure after inserting FILESTREAM data.

The FileStreamGroup1 container has a folder with a GUID name for the dbo.Records table that I created, and that folder container has two folders, with GUID names, for the two columns in that table. The right pane shows the file containing the actual data inserted into one of the columns.

Updating FILESTREAM data

Updates to FILESTREAM data are always performed as a DELETE followed by an INSERT, so you see a new row in the directory for the column(s) updated. Also, the T-SQL “chunked update,” specified with the .WRITE clause, isn’t supported. So any update to FILESTREAM data results into SQL Server creating a new copy of the FILESTREAM data file. I recommend that you use file-system streaming access for manipulation (both inserts and updates) of your FILESTREAM data.

When a FILESTREAM value is set to NULL, the FILESTREAM file associated with that value is deleted when the Garbage Collection thread runs. (Garbage collection is discussed later in this chapter.) The Garbage Collection thread also cleans up old versions of the FILESTREAM files after an UPDATE creates a new file.

Deleting FILESTREAM data

When a row is deleted through the use of a DELETE or a TRUNCATE TABLE statement, any FILESTREAM file associated with the row is deleted. However, deletion of the file isn’t synchronous with row deletion. The file is deleted by the FILESTREAM Garbage Collection thread. This is also true for DELETEs that are generated as part of an UPDATE, as mentioned in the preceding section paragraph.

Manipulating FILESTREAM data and transactions

FILESTREAM data manipulation is fully transactional. However, you need to be aware that when you are manipulating FILESTREAM data, not all isolation levels are supported. Also, some isolation levels are supported for T-SQL access but not for file-system access. Table 8-2 indicates which isolation levels are available in which access mode.

Table 8-2 Isolation levels supported with FILESTREAM data manipulation

Isolation level

T-SQL access

File-system access

Read uncommitted

Supported

Not supported

Read committed

Supported

Supported

Repeatable read

Supported

Not supported

Serializable

Supported

Not supported

Read committed snapshot

Supported

Supported

Snapshot

Supported

Supported

If two processes trying to access the same FILESTREAM datafile are in incompatible modes, the file-system APIs fail with an ERROR_SHARING_VIOLATION message rather than just block, as would happen when using T-SQL. As with all data access, readers and writers within the same transaction can never get a conflict on the same file but unlike non-FILESTREAM access, two write operations within the same transaction can end up conflicting with each other when accessing the same file, unless the file handle has been previously closed. You can read much more about transactions, isolation levels, and conflicts in Chapter 13, “Transactions and concurrency.”

Logging FILESTREAM changes

As mentioned previously, each FILESTREAM filegroup has a $FSLOG folder that keeps track of all FILESTREAM activity that touches that filegroup. The data in this folder is used when you perform transaction log backup and restore operations in the database (which include the FILESTREAM filegroup) and also during the recovery process.

The $FSLOG folder primarily keeps track of new information added to the FILESTREAM filegroup. A file gets added to the log folder to reflect each of the following.

  • A new table containing FILESTREAM data is created.

  • A FILESTREAM column is defined.

  • A new row is inserted containing non-NULL data in the FILESTREAM column.

  • A FILESTREAM value is updated.

  • A COMMIT occurs.

Here are some examples.

  • If you create a table containing two FILESTREAM columns, four files are added to the $FSLOG folder—one for the table, two for the columns, and one for the implied COMMIT.

  • If you insert a single row containing FILESTREAM data in an autocommit transaction, two files are added to the $FSLOG folder—one for the INSERT and one for the COMMIT.

  • If you insert five rows in an explicit transaction, six files are added to the $FSLOG folder.

Files aren’t added to the $FSLOG folder when data is deleted or when a table is truncated or dropped. However, the SQL Server transaction log keeps track of these operations, and a new metadata table contains information about the removed data.

Using garbage collection for FILESTREAM data

The FILESTREAM data can be viewed as serving as the live user data, as well as the log of changes to that data, and as row versions for snapshot operations (discussed in Chapter 13). SQL Server needs to make sure that the FILESTREAM data files aren’t removed if they might possibly be needed for any backup or recovery needs. In particular, for log backups, all new FILESTREAM content must be backed up because the transaction log doesn’t contain the actual FILESTREAM data, and only the FILESTREAM data has the redo information for the actual FILESTREAM contents. In general, if your database isn’t in the SIMPLE recovery mode, you need to back up the log twice before the Garbage Collector can remove unneeded data files from your FILESTREAM folders.

Consider this example: You can start with a clean slate by dropping and re-creating the MyFilestreamDB database. A DROP DATABASE statement immediately removes all the folders and files because now doing any subsequent log backups isn’t possible. The script in Listing 8-3 recreates the database and creates a table with just a single FILESTREAM column. Finally, the script inserts three rows into the table and backs up the database. If you inspect the FileStreamGroup1 container, you see that the folder for the columns contains three files for the three rows.

Listing 8-3 Dropping and recreating a database

USE master;
GO
DROP DATABASE MyFilestreamDB;
GO
CREATE DATABASE  MyFilestreamDB  ON  PRIMARY
     (NAME = N'Rowdata1', FILENAME = N'c:\data\Rowdata1.mdf' , SIZE = 2304KB ,
      MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM DEFAULT( NAME = FSData1,
    FILENAME = 'c:\Data2\FileStreamGroup1'),
FILEGROUP FileStreamGroup2 CONTAINS FILESTREAM ( NAME = FSData2,
    FILENAME = 'c:\Data2\FileStreamGroup2')
 LOG ON
    (NAME = N'FSDBLOG', FILENAME = N'c:\data\FSDB_log.ldf' , SIZE = 1024KB ,
     MAXSIZE = 2048GB , FILEGROWTH = 10%);
GO
USE MyFilestreamDB;
GO
CREATE TABLE dbo.Records
(
        Id [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE,
        SerialNumber INTEGER UNIQUE,
        Chart_Primary VARBINARY(MAX) FILESTREAM NULL
)
FILESTREAM_ON FileStreamGroup1;
GO
INSERT INTO dbo.Records
    VALUES (newid(), 1,
              CAST (REPLICATE (CONVERT(varchar(MAX), 'Base Data'),
                      10000) as varbinary(max))),
           (newid(), 2,
               CAST (REPLICATE (CONVERT(varchar(MAX), 'New Data'),
                       10000) as   varbinary(max))),
           (newid(), 3, 0x);
GO
BACKUP DATABASE MyFileStreamDB to disk = 'C:\backups\FBDB.bak';
GO

Now delete one of the rows, as follows:

DELETE dbo.Records
WHERE SerialNumber = 2;
GO

Now inspect the files on disk, and you still see three files.

Back up the log and run a checkpoint. Note that on a real system, enough changes would probably be made to your data that your database’s log would get full enough to trigger an automatic CHECKPOINT. However, during testing, when you aren’t putting much into the log at all, you have to force the CHECKPOINT:

BACKUP LOG  MyFileStreamDB to disk = 'C:\backups\FBDB_log.bak';
CHECKPOINT;

Now if you check the FILESTREAM data files, you still see three rows. Wait five seconds for garbage collection, and you’ll still see three rows. You need to back up the log and then force another CHECKPOINT:

BACKUP LOG  MyFileStreamDB to disk = 'C:\backups\FBDB_log.bak';
CHECKPOINT;

Now within a few seconds, you should see one of the files disappear. The reason you need to back up the log twice before the physical file is available for garbage collection is to make sure that the file space isn’t reused by other FILESTREAM operations while it still might be needed for restore purposes.

You can run some additional tests of your own. For example, if you try dropping the dbo.Records table, notice that you again have to perform two log backups and CHECKPOINTs before SQL Server removes the folders for the table and the column.

Exploring metadata with FILESTREAM data

Within your SQL Server tables, the storage required for FILESTREAM isn’t particularly complex. In the row itself, each FILESTREAM column contains a file pointer that is 48 bytes in size. Even if you look at a data page with the DBCC PAGE command, not much more information about the file is available. However, SQL Server does provide a new function to translate the file pointer to a path name. The function is actually a method applied to the column name in the table. So the following code returns a UNC name for the file containing the actual column’s data in the row inserted previously:

SELECT Chart_Primary, Chart_Primary.PathName()
FROM dbo.Records
WHERE SerialNumber = 3;
GO

The UNC value returned looks like this:

\\<server_name>\<share_name>\v1\<db_name>\<object_schema>\<table_name>\<column_name>\<GUID>

Keep in mind the following points about using the PathName function.

  • The function name is case-sensitive, even on a server that’s not case-sensitive, so it always must be entered as PathName.

  • The default share_name is the service name for your SQL Server instance (so for the default instance, it is MSSQLSERVER). By using the SQL Server Configuration Manager, you can right-click your SQL Server instance and choose Properties. The FILESTREAM tab of the SQL Server Properties sheet allows you to change the share_name to another value of your choosing.

  • The PathName function can take an optional parameter of 0, 1, or 2, with 0 being the default. The parameter controls only how the server_name value is returned; all other values in the UNC string are unaffected. Table 8-3 shows the meanings of the different values.

Table 8-3 Parameter values for the PathName function

Value

Description

0

Returns the server name converted to BIOS format; for example, \\SERVERNAME\MSSQLSERVER\v1\MyFilestream\dbo\Records\Chart_Primary\A73F19F7-38EA-4AB0-BB89-E6C545DBD3F9

1

Returns the server name without conversion; for example, \\ServerName\MSSQLSERVER\v1\MyFilestream\Dbo\Records\Chart_Primary\A73F19F7-38EA-4AB0-BB89-E6C545DBD3F9

2

Returns the complete server path; for example, \\ServerName.MyDomain.com\MSSQLSERVER\v1\MyFilestream\Dbo\Records\Chart_Primary\A73F19F7-38EA-4AB0-BB89-E6C545DBD3F9

Other metadata gives you information about your FILESTREAM data.

  • sys.database_files returns a row for each of your FILESTREAM files. These files have a type value of 2 and a type_desc value of FILESTREAM.

  • sys.filegroups returns a row for each of your FILESTREAM filegroups. These files have a type value of FD and a type_desc value of FILESTREAM_DATA_FILEGROUP.

  • sys.data_spaces returns one row for each data space, which is either a filegroup or a partition scheme. Filegroups holding FILESTREAM data are indicated by the type FD.

  • sys.tables has a value in the column for filestream_data_space_id, which is the data space ID for either the FILESTREAM filegroup or the partition scheme that the FILESTREAM data uses. Tables with no FILESTREAM data have NULL in this column.

  • sys.columns has a value of 1 in the is_filestream column for columns with the filestream attribute.

The older metadata, such as the system procedure sp_helpdb <database_name> or sp_help <object_name>, doesn’t show any information about FILESTREAM data.

Earlier, this chapter mentioned that rows or objects that are deleted don’t generate files in the $FSLOG folder, but data about the removed data is stored in a system table. No metadata view allows you to see this table; you can observe it only by using the dedicated administrator connection (DAC). You can look in a view called sys.internal_tables for an object with TOMBSTONE in its name. Then, by using the DAC, you can look at the data inside the TOMBSTONE table. If you rerun the preceding script but don’t back up the log, you can use the following code:

USE MyFilestreamDB;
GO
SELECT name FROM sys.internal_tables
WHERE name like '%tombstone%';
-- I see the table named: filestream_tombstone_2073058421
-- Reconnect using DAC, which puts us in the master database
USE MyFileStreamDB;
GO
SELECT * FROM sys.filestream_tombstone_2073058421;
GO

If this table is empty, the login SQL Server and the $FSLOG are in sync, and all unneeded files have been removed from the FILESTREAM containers on disk.

Creating a FileTable

FileTable storage, introduced in SQL Server 2012, allows you to create special user tables that have a predefined schema and to extend the capabilities of FILESTREAM data discussed earlier. The two most important extensions are that FileTables allow full support and compatibility with Win32 applications and support the hierarchical namespace of directories and files. Each row in a FileTable table represents an operating system file or directory in a hierarchical structure and contains attributes about the file or directory, such as created date, modified, date, and the name of the file or directory.

The first step in creating a FileTable is to make sure that the database supports FILESTREAM data with at least one FILESTREAM filegroup. Also, the database option to allow NON_TRANSACTED_ACCESS must be set to either FULL or READ_ONLY, and a directory name must be supplied for use by Windows applications as part of the share name when accessing the FileTable files. These options can be supplied as part of the initial CREATE DATABASE operation, or the database can be altered to include these options:

ALTER DATABASE MyFilestreamDB
SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL, DIRECTORY_NAME = N'FileTableData' );

By setting NON_TRANSACTED_ACCESS to something other than NONE, you are enabling FileTable storage within the database. SQL Server 2012 provides a new catalog view, called sys.database_filestream_options, to examine each database’s readiness for storing FileTable data. The following query selects from that view showing only rows where non-transacted access has been allowed. A value of 0 for NON_TRANSACTED_ACCESS means the feature wasn’t enabled, 1 means it has been enabled for READ_ONLY, and 2 means it has been enabled for FULL access.

SELECT DB_NAME(database_id) as DBNAME,  non_transacted_access_desc, directory_name
FROM sys.database_filestream_options
WHERE non_transacted_access > 0;

Enabling the NON_TRANSACTED_ACCESS to the MyFilestreamDB as shown in the preceding ALTER statement would produce the following results:

DB_NAME               non_transacted_access_desc      directory_name
---------------       --------------------------      ----------------
MyFilestreamDB        FULL                            FileTableData

After the database is properly configured for FileTable access, you can create a FileTable with a very simple CREATE TABLE statement. Because the schema is predefined, all you do is specify a name for your FileTable and a directory name in which all the files stored in this FileTable can be found in the operating system. Optionally, you can also specify a collation for the data in the files stored in the FileTable, but if none is specified, the default database collation is used.

CREATE TABLE Documents AS Filetable
      WITH (Filetable_Directory = 'DocumentsData');

At this point, you can look at the Windows share that is now available by allowing your database NON_TRANSACTED_ACCESS. You can see this by opening Windows Explorer on the server machine and navigating to either \\127.0.0.1 or \\<your machine name>. Along with whatever other shares have been set up on the machine, you should see a share for your SQL Server instance name, or mssqlserver if your instance is the default instance. You can open up the share and see a directory named FiletableData, and when you open the directory, you should see the table name DocumentsData. Figure 8-6 shows you what this structure looks like on my machine, for a SQL Server 2012 instance called denali. (In SQL Server 2008 and 2008 R2, you also saw share names for your SQL Server instances that have been configured to allow FILESTREAM access. However, attempting to access the share directly will result in an error.)

Figure 8-6

Figure 8-6 The FileTable share containing no files.

While exploring the operating system files, you can also revisit the Data2 folder (which was the default FILESTREAM container for the database), which you saw earlier in the FILESTREAM data discussion, and see that a GUID now exists for one more table containing FILESTREAM data and one FILESTREAM column in that table.

As mentioned, the FileTable data is completely available through the share to all Windows applications, including Windows Explorer. You can copy a file from anywhere on your system and paste it into the share. Figure 8-7 shows the share now containing a copy of one of my errorlog files.

Figure 8-7

Figure 8-7 The FileTable share after inserting one file.

When a SQL Server instance is enabled for FILESTREAM access, SQL Server installs a component called an NTFS filter driver. When a Windows application interacts with a FileTable share, the interaction is intercepted by the NTFS filter driver and is redirected to SQL Server to allow these changes that you make to the share, such as inserting a new file, to be reflected in the table inside SQL Server. From SQL Server Management Studio, I can also SELECT from my Documents table, and see that it now has one row in it, to reflect the file that was added to it through Windows.

Some of the columns are returned, but the wide output prevents showing all the columns:

Table 8-4 lists all the column names available in the FileTable.

Table 8-4 Predefined columns in a FileTable

File attribute name

Type

Description

stream_id

[uniqueidentifier] rowguidcol

A unique ID for the FILESTREAM data.

file_stream

varbinary(max)

filestream

Contains the FILESTREAM data.

Name

nvarchar(255)

The file or directory name.

path_locator

hierarchyid

The position of this node in the hierarchical FileNamespace. Primary key for the table.

parent_path_locator

hierarchyid

The hierarchyid of the containing directory. parent_path_locator is a persisted computed column.

file_type

nvarchar(255)

Represents the type of the file. This column can be used as the TYPE COLUMN when you create a full-text index. file_type is a persisted computed column based on the file extension.

cached_file_size

bigint

The size in bytes of the FILESTREAM data. cached_file_size is a persisted computed column.

creation_time

datetime2(4)

not null

The date and time that the file was created.

last_write_time

datetime2(4)

not null

The date and time that the file was last updated.

last_access_time

datetime2(4)

not null

The date and time that the file was last accessed.

is_directory

bit

not null

Indicates whether the row represents a directory. This value is calculated automatically, and can’t be set.

is_offline

bit

not null

Offline file attribute.

is_hidden

bit

not null

Hidden file attribute.

is_readonly

bit

not null

Read-only file attribute.

is_archive

bit

not null

Archive attribute.

is_system

bit

not null

System file attribute.

is_temporary

bit

not null

Temporary file attribute.

You can see that many of these column names correspond to attributes that you can see for all your files through Windows Explorer.

Considering performance for FILESTREAM data

Although a thorough discussion of performance tuning and troubleshooting is beyond the scope of this book, I want to provide you with some basic information about setting up your system to get high performance from FILESTREAM data. Paul Randal, one of the co-authors of this book, has written a white paper on FILESTREAM that you can access on the MSDN site at http://msdn.microsoft.com/en-us/library/cc949109.aspx. (This white paper is also available on this book’s companion website, http://www.SQLServerInternals.com/companion.) This section just briefly mentions some of the main points Paul makes regarding what you can do to get good performance. All these suggestions are explained in much more detail in the white paper.

  • Make sure that you’re storing the right-sized data in the right way. Jim Gray (et al) published a research paper several years ago titled “To BLOB or Not to BLOB: Large Object Storage in a Database or a Filesystem?” that gives recommendations for when to store data outside the database. To summarize the findings, large object data smaller than 256 KB should be stored in a database, and data that’s 1 MB or larger should be stored in the file system. For data between these two values, the answer depends on other factors, and you should test your application thoroughly. The key point here is that you won’t get good performance if you store lots of relatively small large objects using FILESTREAM.

  • Use an appropriate RAID level for the NTFS volume that hosts the FILESTREAM data container. For example, don’t use RAID 5 for a write-intensive workload.

  • Use an appropriate disk technology. SCSI is usually faster than SATA/IDE because SCSI drives usually have higher rotational speeds, which help them have lower latency and seek times. However, SCSI drives are also more expensive.

  • Whichever disk technology you choose, if it is SATA, ensure that it supports NCQ, and if SCSI, ensure that it supports CTQ. Both of these allow the drives to process multiple, interleaved I/Os concurrently.

  • Separate the data containers from each other, and separate the containers from other database data and log files. This avoids contention for the disk heads.

  • Defragment the NTFS volume, if needed, before setting up FILESTREAM, and defragment periodically to maintain good scan performance.

  • Turn off 8.3 name generation on the NTFS volume by using the command-line fsutil utility. This is an order-N algorithm that must check that the new name generated doesn’t collide with any existing names in the directory. Note, however, that this slows down insert and update performance a lot.

  • Use fsutil to turn off tracking of last access time.

  • Set the NTFS cluster size appropriately. For larger objects greater than 1 MB in size, use a cluster size of 64 KB to help reduce fragmentation.

  • A partial update of FILESTREAM data creates a new file. Batch lots of small updates into one large update to reduce churn.

  • When streaming the data back to the client, use a server message block (SMB) buffer size of approximately 60 KB or multiples thereof. This helps keep the buffers from getting overly fragmented, because Transmission Control Protocol/Internet Protocol (TCP/IP) buffers are 64 KB.

Taking these suggestions into consideration and performing thorough testing of your application can give you great performance when working with very large data objects.

Summarizing FILESTREAM and FileTable

When you configure a SQL Server instance to support FILESTREAM storage, you are allowing Windows files to be accessed and manipulated as data in any SQL Server table. When you configure a database with FILESTREAM filegroups to have non-transacted access, you are allowing FileTables to be created in the database. A FileTable builds on the FILESTREAM capability by providing a table whose rows not only contain the Windows file contents but also all the Windows file properties. These files are accessible through the FileTable share and can be manipulated through any Windows application.

Keep in mind that the FileTable structure is predefined and can’t be altered. If you want to add your own attributes along with the files in a FileTable, you can create another table with your user-defined attributes and a foreign key reference to the FileTable column path_locator, which is the primary key of the FileTable.

Detailed information about how to work with this FileTable data through a Windows application is beyond the scope of this book. The goal of this section was to present some of the storage internals of the FILESTREAM data and FileTables, and well as to give you an idea of how SQL Server works with these constructs.