Microsoft SQL Server 2008 Internals: Transactions and Concurrency

  • 3/11/2009

Internal Locking Architecture

Locks are not on-disk structures. You won’t find a lock field directly on a data page or a table header, and the metadata that keeps track of locks is never written to disk. Locks are internal memory structures—they consume part of the memory used for SQL Server. A lock is identified by lock resource, which is a description of the resource that is locked (a row, index key, page, or table). To keep track of the database, the type of lock, and the information describing the locked resource, each lock requires 64 bytes of memory on a 32-bit system and 128 bytes of memory on a 64-bit system. This 64-byte or 128-byte structure is called a lock block.

Each process holding a lock also must have a lock owner, which represents the relationship between a lock and the entity that is requesting or holding the lock. The lock owner requires 32 bytes of memory on a 32-bit system and 64 bytes of memory on a 64-bit system. This 32-byte or 64-byte structure is called a lock owner block. A single transaction can have multiple lock owner blocks; a scrollable cursor sometimes uses several. Also, one lock can have many lock owner blocks, as is the case with a shared lock. As mentioned, the lock owner represents a relationship between a lock and an entity, and the relationship can be granted, waiting, or in a state called waiting-to-convert.

The lock manager maintains a lock hash table. Lock resources, contained within a lock block, are hashed to determine a target hash slot in the hash table. All lock blocks that hash to the same slot are chained together from one entry in the hash table. Each lock block contains a 15-byte field that describes the locked resource. The lock block also contains pointers to lists of lock owner blocks. There is a separate list for lock owners in each of the three states. Figure 10-3 shows the general lock architecture.

Figure 10-3

Figure 10-3. SQL Server locking architecture

The number of slots in the hash table is based on the system’s physical memory, as shown in Table 10-7. There is an upper limit of 231 slots. All instances of SQL Server on the same machine have a hash table with the same number of slots. Each entry in the lock hash table is 16 bytes in size and consists of a pointer to a list of lock blocks and a spinlock to guarantee serialized access to the same slot.

Table 10-7. Number of Slots in the Internal Lock Hash Table

Physical Memory (MB)

Number of Slots

Memory Used

< 32

214 = 16384

128 KB

>= 32 and < 64

215 = 32768

256 KB

>= 64 and < 128

216 = 65536

512 KB

>= 128 and < 512

218 = 262144

2048 KB

>= 512 and < 1024

219 = 524288

4096 KB

>= 1024 and < 4096

221 = 2097152

16384 KB

>= 4096 and < 8192

222 = 4194304

32768 KB

>= 8192 and < 16384

223 = 8388608

65536 KB

>= 16384

225 = 33554432

262144 KB

The lock manager allocates in advance a number of lock blocks and lock owner blocks at server startup. On NUMA configurations, these lock and lock owner blocks are divided among all NUMA nodes. So when a lock request is made, local lock blocks are used. If the number of locks has been set by sp_configure, it allocates that configured number of lock blocks and the same number of lock owner blocks. If the number is not fixed (0 means auto-tune), it allocates 2,500 lock blocks for your SQL Server instance. It allocates twice as many (2 * # lock blocks) of the lock owner blocks. At their maximum, the static allocations can’t consume more than 25 percent of the committed buffer pool size.

When a request for a lock is made and no free lock blocks remain, the lock manager dynamically allocates new lock blocks instead of denying the lock request. The lock manager cooperates with the global memory manager to negotiate for server allocated memory. When necessary, the lock manager can free the dynamically allocated lock blocks. The lock manager is limited to 60 percent of the buffer manager’s committed target size allocation to lock blocks and lock owner blocks.

Lock Partitioning

For large systems, locks on frequently referenced objects can become a performance bottleneck. The process of acquiring and releasing locks can cause contention on the internal locking resources. Lock partitioning enhances locking performance by splitting a single lock resource into multiple lock resources. For systems with 16 or more CPUs, SQL Server automatically splits certain locks into multiple lock resources, one per CPU. This is called lock partitioning, and there is no way for a user to control this process. (Do not confuse lock partitioning with partition locks, which are discussed in the section entitled “Lock Escalation,” later in this chapter.) An informational message is sent to the error log whenever lock partitioning is active. The error message is “Lock partitioning is enabled. This is an informational message only. No user action is required.” Lock partitioning applies only to full object locks (for example, tables and views) in the following lock modes: S, U, X, and SCH-M. All other modes (NL, SCH_S, IS, IU, and IX) are acquired on a single CPU. SQL Server assigns a default lock partition to every transaction when the transaction starts. During the life of that transaction, all lock requests that are spread over all the partitions use the partition assigned to that transaction. By this method, access to lock resources of the same object by different transactions is distributed across different partitions.

The resource_lock_partition column in sys.dm_tran_locks indicates which lock partition a particular lock is on, so you can see multiple locks for the exact same resource with different resource_lock_partition values. For systems with fewer than 16 CPUs, for which lock partitioning is never used, the resource_lock_partition value is always 0.

For example, consider a transaction acquiring an IS lock in REPEATABLE READ isolation, so that the IS lock is held for the duration of the transaction. The IS lock is acquired on the transaction’s default partition—for example, partition 4. If another transaction tries to acquire an X lock on the same table, the X lock must be acquired on ALL partitions. SQL Server successfully acquires the X lock on partitions 0 to 3, but it blocks when attempting to acquire an X lock on partition 4. On partition IDs 5 to 15, which have not yet acquired the X lock for this table, other transactions can continue to acquire any locks that do not cause blocking.

With lock partitioning, SQL Server distributes the load of checking for locks across multiple spinlocks, and most accesses to any given spinlock are from the same CPU (and practically always from the same node), which means the spinlock should not spin often.

Lock Blocks

The lock block is the key structure in SQL Server’s locking architecture, shown earlier in Figure 10-3. A lock block contains the following information:

  • Lock resource information containing the lock resource name and details about the lock.

  • Pointers to connect the lock blocks to the lock hash table.

  • Pointers to lists of lock owner blocks for locks on this resource that have been granted. Four grant lists are maintained to minimize the amount of time it takes to find a granted lock.

  • A pointer to a list of lock owner blocks for locks on this resource that are waiting to be converted to another lock mode. This is called the convert list.

  • A pointer to a list of lock owner blocks for locks that have been requested on this resource but have not yet been granted. This is called the wait list.

The lock resource uniquely identifies the data being locked. Its structure is shown in Figure 10-4. Each “row” in the figure represents 4 bytes, or 32 bits.

Figure 10-4

Figure 10-4. The structure of a lock resource

The meanings of the fields shown in Figure 10-4 are described in Table 10-8. The value in the resource type byte is one of the locking resources described earlier in Table 10-5. The number in parentheses after the resource type is the code number for the resource type (which we see in the syslockinfo table a little later in the chapter). The meaning of the values in the three data fields varies depending on the type of resource being described. SR indicates a subresource (which I describe shortly).

Table 10-8. Fields in the Lock Resource Block

Resource Content

Resource Type

Data 1

Data 2

Data 3

Database (2)

SR

0

0

File (3)

File ID

0

0

Index (4)

Object ID

SR

Index ID

Table (5)

Object ID

SR

0

Page (6)

Page number

0

Key (7)

Partition ID

Hashed key

Extent (8)

Extent ID

0

RID (9)

RID

0

The following are some of the possible SR (SubResource) values. If the lock is on a Database resource, SR indicates one of the following:

  • Full database lock

  • Bulk operation lock

If the lock is on a Table resource, SR indicates one of the following:

  • Full table lock (default)

  • Update statistics lock

  • Compile lock

If the lock is on an Index resource, SR indicates one of the following:

  • Full index lock (default)

  • Index ID lock

  • Index name lock

Lock Owner Blocks

Each lock owned or waited for by a session is represented in a lock owner block. Lists of lock owner blocks form the grant, convert, and wait lists that hang off the lock blocks. Each lock owner block for a granted lock is linked with all other lock owner blocks for the same transaction or session so they can be freed as appropriate when the transaction or session ends.

syslockinfo Table

Although the recommended way of retrieving information about locks is through the sys.dm_tran_locks view, there is another metadata object called syslockinfo that provides internal information about locks. Prior to the introduction of the DMVs in SQL Server 2005, syslockinfo was the only internal metadata available for examining locking information.

In fact, the stored procedure sp_lock is still defined to retrieve information from syslockinfo instead of from sys.dm_tran_locks. I will not go into full detail about syslockinfo because almost all the information from that table is available, in a much more readable form, in the sys.dm_tran_locks view. However, syslockinfo is available in the master database for you to take a look at. One column, however, is of particular interest—the rsc_bin column, which contains a 16-byte description of a locked resource.

You can analyze the syslockinfo.rsc_bin field as the resource block. Let’s look at an example. I select a single row from the Person table in AdventureWorks2008 using the REPEATABLE READ isolation level, so my shared locks continue to be held for the duration of the transaction. I then look at the rsc_bin column in syslockinfo for key locks, page locks, and table locks:

USE AdventureWorks2008
GO
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
GO
BEGIN TRAN
SELECT * FROM Person.Person
WHERE BusinessEntityID = 249;
GO
SELECT rsc_bin, rsc_type
FROM master..syslockinfo
WHERE rsc_type IN (5,6,7);
GO

Here are the three rows in the result set:

rsc_bin                            rsc_type
---------------------------------- --------
0x805EFA59000000000000000007000500 5
0x19050000010000000000000007000600 6
0x710000000001F900CE79D52507000700 7

The last 2 bytes in rsc_bin are the resource mode, so after byte-swapping, you can see the same value as in the rsc_type column—for example, you byte-swap 0500 to 0005 to resource mode 5 (a table lock). The next 2 bytes at the end indicate the database ID, and for all three rows, the value after byte-swapping is 0007, which is the database ID of my AdventureWorks2008 database.

The rest of the bytes vary depending on the type of resource. For a table, the first 4 bytes represent the object ID. The preceding row for the object lock (rsc_type = 5) after byte swapping has a value of 59FA5E80, which is 1509580416 in decimal. I can translate this to an object name as follows:

SELECT object_name(1509580416)

This shows me the Person table.

For a PAGE (rsc_type = 6), the first 6 bytes are the page number followed by the file number. After byte-swapping, the file number is 0001, or 1 decimal, and the page number is 00000519, or 9889 in decimal. So the lock is on file 1, page 1305.

Finally, for a KEY (rsc_type = 7), the first 6 bytes represent the partition ID but the translation is a bit trickier. We need to add another 2 bytes of zeros to the value after byte-swapping, so we end up with 0100000000710000, which translates to 72057594045333504 in decimal. To see which object this partition belongs to, I can query the sys.partitions view:

SELECT object_name(object_id)
FROM sys.partitions
WHERE partition_ID = 72057594045333504;

Again, the result is that this partition is part of the Person table. The next 6 bytes of rsc_bin for the KEY resource are F900CE79D525. This is a character field, so no byte-swapping is needed. However, the value is not further decipherable. Key locks have a hash value generated for them, based on all the key columns of the index. Indexes can be quite long, so for almost any possible data type, SQL Server needs a consistent way to keep track of which keys are locked. The hashing function therefore generates a 6-byte hash string to represent the key. Although you can’t reverse-engineer this value and determine exactly which index row is locked, you can use it to look for matching entries, just like SQL Server does. If two rsc_bin values have the same 6-byte hash string, they are referring to the same lock resource.

In addition to detecting references to the same lock resource, you can determine which specific keys are locked by using the undocumented value %%lockres%%, which can return the hash string for any key. Selecting this value, along with data from the table, returns the lock resource for every row in the result set, based on the index used to retrieve the data. Consider the following example, which creates a clustered and nonclustered index on a tiny table and then selects the %%lockres%% value for each row, first using the clustered index and then using the nonclustered index:

CREATE TABLE lockres (c1 int, c2 int);
GO
INSERT INTO lockres VALUES (1,10);
INSERT INTO lockres VALUES (2,20);
INSERT INTO lockres VALUES (3,30);
GO
CREATE UNIQUE CLUSTERED INDEX lockres_ci ON lockres(c1);
CREATE UNIQUE NONCLUSTERED INDEX lockres_nci ON lockres(c2);
GO
SELECT %%lockres%% AS lock_resource, * FROM lockres WITH (INDEX = lockres_ci);
SELECT %%lockres%% AS lock_resource, * FROM lockres WITH (INDEX = lockres_nci);
GO

I get the following results. The first set of rows shows the lock resource for the clustered index keys, and the second set shows the lock resources for the nonclustered index:

lock_resource                    c1          c2
-------------------------------- ----------- -----------
(010086470766)                   1           10
(020068e8b274)                   2           20
(03000d8f0ecc)                   3           30
lock_resource                    c1          c2
-------------------------------- ----------- -----------
(0a0087c006b1)                   1           10
(14002be0c001)                   2           20
(1e004f007d6e)                   3           30

I can use this lock resource to find which row in a table matches a locked resource. For example, if sys.dm_tran_locks indicates that a row with the lock resource (010086470766) is holding a lock in the lockres table, I could find which row that resource corresponds to with the following query:

SELECT * FROM lockres
WHERE %%lockres%% = '(010086470766)'

Note that if the table is a heap and I look for the lock resource when scanning the table, the lock resource is the actual row ID (RID). The value returned looks just like the special value %%physloc%%, which I told you about in Chapter 5:

CREATE TABLE lockres_on_heap (c1 int, c2 int);
GO
INSERT INTO lockres_on_heap VALUES (1,10);
INSERT INTO lockres_on_heap VALUES (2,20);
INSERT INTO lockres_on_heap VALUES (3,30);
GO
SELECT %%lockres%% AS lock_resource, * FROM lockres_on_heap;

Here are my results:

lock_resource                    c1          c2
-------------------------------- ----------- ----
1:169:0                          1           10
1:169:1                          2           20
1:169:2                          3           30