Developing Cloud Applications with Windows Azure Storage: Blobs

  • 3/15/2013

Using page blobs

Page blobs add the features of random access and sparse population to the blob storage story, and they quintuple the maximum size from 200 MB to 1 terabyte. Page blobs were added to Windows Azure storage to enable development of the VHD virtual drive abstraction. They are also useful in fixed-length logging scenarios, where rolling overwrite of the oldest data in the log may be desired.

The sparse feature is nice because it allows you to allocate a storage amount up to 1 terabyte, but you are charged only for the pages that you place in the blob, no matter how much storage space you allocate. When reading and writing page blobs, you’re required to read and write your data in page-sized chunks that begin on a page boundary.

In the following sample code, you create an array of 5 bytes (integers 1 through 5), and you write that data to page 0. Next, you create an array of 5 bytes (descending integers 5 through 1), and you write that data to page 2. There is no significance to the integers I selected for this example beyond demonstrating that you can read and write data in sparsely populated pages that begin on page boundaries.

const Int32 c_BlobPageSize = 512;
CloudBlobClient client = account.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("demo").EnsureExists();
CloudPageBlob pageBlob = container.GetPageBlobReference("MyPageBlob.txt");

// You must create a page blob specifying its size:
pageBlob.Create(10 * c_BlobPageSize);

Byte[] data = new Byte[1 * c_BlobPageSize];  // Must be multiple of page size

// Write some data to Page 0 (offset 0):
Array.Copy(new Byte[] { 1, 2, 3, 4, 5 }, data, 5);
pageBlob.WritePages(new MemoryStream(data), 0 * c_BlobPageSize); // Offset 0
// Write some data to Page 2 (offset 1024):
Array.Copy(new Byte[] { 5, 4, 3, 2, 1 }, data, 5);
pageBlob.WritePages(new MemoryStream(data), 2 * c_BlobPageSize); // Offset 1024

// Show committed pages:
foreach (PageRange pr in pageBlob.GetPageRanges())
    Console.WriteLine("Start={0,6:N0}, End={1,6:N0}", pr.StartOffset, pr.EndOffset);

// Read the whole blob (with lots of 0's):
using (var stream = new MemoryStream()) {
    data = stream.GetBuffer();
Console.WriteLine("Downloaded length={0:N0}", data.Length);
    "  Page 0 data: " + BitConverter.ToString(data, 0 * c_BlobPageSize, 10));
    "  Page 1 data: " + BitConverter.ToString(data, 1 * c_BlobPageSize, 10));
    "  Page 2 data: " + BitConverter.ToString(data, 2 * c_BlobPageSize, 10));

At this point of execution, you have only two 512-byte pages in a 10-page, sparsely populated blob (pages 4–9 would look identical to pages 1 and 3). The first page blob starts at byte 0 and ends at byte 511, and the second one starts at byte 1024 and ends at byte 1535. When you loop through the pages of the blob and display the bytes that are in each page, you can see that pages 0 and 2 contain the bytes you uploaded, whereas pages 1, 3, and 4–10 all return zeros. You are being charged only for the two pages you stored, but the blob behaves as if all 10 pages were populated.

From this output, you might be tempted to think that uploading a page of zeros would be treated as a nonexistent page. Unfortunately, it would be incorrect to make such an assumption. A page must be cleared from the collection of pages in a blob in order for it to return to a nonexistent sparse state and for you to avoid being billed. You use the ClearPages method of the page blob in the code that follows to do this. Pages of zeros are considered part of the blob’s official population, and you will be billed for their storage.

Over the network, the data is transmitted as HTTP PUT requests with the query string parameter comp=page, the HTTP header x-ms-range indicating the byte range, and x-ms-page-write indicating the type of operation being performed (Update in this example).

?comp=page&timeout=90 HTTP/1.1
x-ms-version: 2012-02-12
User-Agent: WA-Storage/2.0.0
x-ms-range: bytes=0-511
x-ms-page-write: Update
x-ms-date: Mon, 31 Dec 2012 07:45:32 GMT
Authorization: SharedKey azureinsiders:MLh8U/RxECpksvMuHdgzn2KDOQ6CSUHlku4NJPb7MJI=
Content-Length: 512<unprintable data>

After writing the pages to blob storage, as shown in the preceding code, there are two 512-byte pages in a sparsely populated 5,120-byte page blob.

Start=0, End=511
Start=1024, End=1535
Downloaded length=5,120
  Page 0 data: 01-02-03-04-05-00-00-00-00-00
  Page 1 data: 00-00-00-00-00-00-00-00-00-00
  Page 2 data: 05-04-03-02-01-00-00-00-00-00
  Page 3 data: 00-00-00-00-00-00-00-00-00-00

You can continue modifying blobs by page. The following code shows reading a specific page range, clearing a set of pages, and then committing those changes.

// Read a specific range from the blob (offset 1024, 10 bytes):
using (var stream = new MemoryStream()) {
    pageBlob.DownloadRangeToStream(stream, 2 * c_BlobPageSize, 10);
    stream.Seek(0, SeekOrigin.Begin);
    data = new BinaryReader(stream).ReadBytes((Int32)stream.Length);
    Console.WriteLine("  Page 2 data: " + BitConverter.ToString(data, 0, 10));

// Clear a range of pages (offset 0, 512 bytes):
pageBlob.ClearPages(0, 512);

// Show committed pages:
foreach (PageRange pr in pageBlob.GetPageRanges())
    Console.WriteLine("Start={0,6:N0}, End={1,6:N0}", pr.StartOffset, pr.EndOffset);

After clearing the bytes in page 0 with the ClearPages method, only page 2 remains committed in this page blob. You use the GetPageRanges method to return a list of pages and then iterate over the result to prove that there is only one remaining with a starting position of 1,024 and an ending position of 1,535.