Developing Cloud Applications with Windows Azure Storage: Blobs

  • 3/15/2013

Using block blobs

As you learned earlier in this chapter, block blobs segment your data into chunks, or blocks. The size of one of these chunks is 4 MB or smaller.

When you upload data using the block semantics, you must provide a block ID, which is stored with your data; the stream that you are uploading your data from; and an MD5 hash of your data that is used to verify the successful transfer but is not stored. Uploaded blocks are stored in an uncommitted state. After uploading all of the blocks and calling the Commit method, the uncommitted blobs become committed in an atomic transaction. There is a restriction that the final blob be no greater than 200 GB after it is committed. An exception is thrown if this value is exceeded. If you don’t commit an uploaded blob within seven days, Windows Azure storage deletes them.

The block ID is an array of 64 bytes (or fewer) that is base64-encoded for transport over the HTTP protocol.

Another useful characteristic of block blobs is that they can be uploaded in parallel to increase throughput, providing you have unused CPU power and available network bandwidth.

In the following code, you create an array of three strings (A, B, and C) that represents three distinct single-character blocks of data that you want to place in blob storage. You encode this array of strings into a memory stream using UTF8 encoding and then, for every block, you call the PutBlock method, passing your block ID, the stream containing your data, and an MD5 hash of the data being put into blob storage.

Hashes must be calculated and blocks must be apportioned before they can be stored, so you will start with the Windows Azure client library code for this example, and then look at the RESTful representation of this code immediately thereafter.

// Put 3 blocks to the blob verifying data integrity
String[] words = new[] { "A ", "B ", "C " };
var MD5 = new MD5Cng();
for (Int32 word = 0; word < words.Length; word++) {
    Byte[] wordBytes = words[word].Encode(Encoding.UTF8);
    // Azure verifies data integrity during transport; failure=400 (Bad Request)
    String md5Hash = MD5.ComputeHash(wordBytes).ToBase64String();
    blockBlob.PutBlock(word.ToBlockId(), new MemoryStream(wordBytes), md5Hash);
}

Execution of the preceding code causes three HTTP PUT requests to be made against data storage—one for each of the three blocks containing the data A, B and C. The comp=block parameter controls the kind of blob you are updating, and the blockid=<blockid> (where the <blockid> represents a unique block identifier).

PUT http://azureinsiders.blob.core.windows.net/demo/MyBlockBlob.txt
    ?comp=block&blockid=MDAwMDA%3D&timeout=90 HTTP/1.1
x-ms-version: 2012-02-12
User-Agent: WA-Storage/2.0.0
Content-MD5: Z0O5/MV88bFp+072x6lV0g==
x-ms-date: Sun, 30 Dec 2012 03:02:12 GMT
Authorization: SharedKey azureinsiders:tM/FIZZnnzdb1fFIjgD+hb/wiHH0FyFvGN1JPx82TPo=
Host: azureinsiders.blob.core.windows.net
Content-Length: 2
Connection: Keep-Alive

A

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Content-MD5: Z0O5/MV88bFp+072x6lV0g==
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 34e2df15-088f-46e1-af4e-6c10ef390940
x-ms-version: 2012-02-12
Date: Sun, 30 Dec 2012 03:02:15 GMT

0

PUT http://azureinsiders.blob.core.windows.net/demo/MyBlockBlob.txt
    ?comp=block&blockid=MDAwMDE%3D&timeout=90 HTTP/1.1
x-ms-version: 2012-02-12
User-Agent: WA-Storage/2.0.0
Content-MD5: CUf4UWGwWRnZaUDz3hSFLg==
x-ms-date: Sun, 30 Dec 2012 03:02:14 GMT
Authorization: SharedKey azureinsiders:6edn0FFuqGuoe3qt9cmMtYD6OkLChpFFddBm3BEtx9k=
Host: azureinsiders.blob.core.windows.net
Content-Length: 2

B

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Content-MD5: CUf4UWGwWRnZaUDz3hSFLg==
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: ca5790b6-d889-43d5-ab24-a21a21617fbf
x-ms-version: 2012-02-12
Date: Sun, 30 Dec 2012 03:02:16 GMT

0

PUT http://azureinsiders.blob.core.windows.net/demo/MyBlockBlob.txt
    ?comp=block&blockid=MDAwMDI%3D&timeout=90 HTTP/1.1
x-ms-version: 2012-02-12
User-Agent: WA-Storage/2.0.0
Content-MD5: Q60OVNgdC/J0zJ/wCXMiQw==
x-ms-date: Sun, 30 Dec 2012 03:02:15 GMT
Authorization: SharedKey azureinsiders:uc4Ttlza/fqoi5lYtXpYKqbhTLQoeojDOnvAGFf5OC8=
Host: azureinsiders.blob.core.windows.net
Content-Length: 2

C

HTTP/1.1 201 Created
Transfer-Encoding: chunked
Content-MD5: Q60OVNgdC/J0zJ/wCXMiQw==
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 90adf2d6-7ea9-4b58-90a9-41d23e47de36
x-ms-version: 2012-02-12
Date: Sun, 30 Dec 2012 03:02:18 GMT

0

All three uploaded blocks of your block blob remain in an uncommitted state until you call Commit on them. You can verify this by downloading a list of uncommitted block IDs using the DownLoadBlockList method and passing in a filter of Uncommitted. Two other values in the BlockListingFilter enumeration—All and Committed—allow you to download a list of all blocks or only those blocks that have been committed, respectively.

Console.WriteLine("Blob's uncommitted blocks:");
foreach (ListBlockItem lbi in blockBlob.DownloadBlockList(BlockListingFilter.Uncommitted))
    Console.WriteLine("   Name={0}, Length={1}, Committed={2}",
       lbi.Name.FromBlockId(), lbi.Length, lbi.Committed);
// Fails
try {
    blockBlob.DownloadText();
}
catch (StorageException ex) {
    Console.WriteLine(String.Format("Failure: Status={0}({0:D}), Msg={1}",
        (HttpStatusCode)ex.RequestInformation.HttpStatusCode,
         ex.RequestInformation.HttpStatusMessage));
}

Executing the preceding code demonstrates that the three uploaded blobs all have an uncommitted status and any attempt to download the uncommitted blob will result in failure.

Blob's uncommitted blocks:
   Name=0, Size=2, Committed=False
   Name=1, Size=2, Committed=False
   Name=2, Size=2, Committed=False
Failure: Status=NotFound, Msg=The specified blob does not exist.

You can request a list of all uncommitted blobs directly by sending an HTTP GET request to the blob’s URI, the query string parameter comp=blocklist, and blocklisttype=Uncommitted, as shown here.

GET http://azureinsiders.blob.core.windows.net/demo/MyBlockBlob.txt
    ?comp=blocklist&blocklisttype=Uncommitted&timeout=90 HTTP/1.1
x-ms-version: 2012-02-12
User-Agent: WA-Storage/2.0.0
x-ms-date: Sun, 30 Dec 2012 04:28:08 GMT
Authorization: SharedKey azureinsiders:zKdrwpdnKaQX8UKeq3COblqvMc3BDBhmmmDdWSuS4Wo=
Host: azureinsiders.blob.core.windows.net

You are not limited to placing blocks in storage one at a time, or even in the same order that you have them arranged. In fact, in some applications, it may even be desirable to upload the same block multiple times at different positions within the blob, or to completely change the order in which the blocks exist. Imagine scenarios in which blocks of data are reorganized based on sorting some element of their content.

// Commit the blocks in order (and multiple times):
blockBlob.PutBlockList(new[] { 0.ToBlockId(), 0.ToBlockId(), 1.ToBlockId(), 2.ToBlockId() });
// Succeeds
try {
     blockBlob.DownloadText();
}
catch (StorageException ex) {
    Console.WriteLine(String.Format("Failure: Status={0}({0:D}), Msg={1}",
         (HttpStatusCode)ex.RequestInformation.HttpStatusCode,
         ex.RequestInformation.HttpStatusMessage));
}

Console.WriteLine("Blob's committed blocks:");
foreach (ListBlockItem lbi in blockBlob.DownloadBlockList())
    Console.WriteLine("   Name={0}, Length={1}, Committed={2}",
       lbi.Name.FromBlockId(), lbi.Length, lbi.Committed);

Executing the preceding code commits the changes and produces the following results, confirming that the blobs are now all committed and that the A blob was committed twice.

Blob's committed blocks:
   Name=0, Size=2, Committed=True
   Name=0, Size=2, Committed=True
   Name=1, Size=2, Committed=True
   Name=2, Size=2, Committed=True
A A B C

As you might anticipate from the pattern that is emerging, you can request a list of all committed blobs directly by sending an HTTP GET request to the blob’s URI and the query string parameter comp=blocklist, and blocklisttype=Committed as follows.

GET http://azureinsiders.blob.core.windows.net/demo/MyBlockBlob.txt
    ?comp=blocklist&blocklisttype=Committed&timeout=90 HTTP/1.1
x-ms-version: 2012-02-12
User-Agent: WA-Storage/2.0.0
x-ms-date: Sun, 30 Dec 2012 21:52:11 GMT
Authorization: SharedKey azureinsiders:AMEqUfAuSg6oub4/zz+aE5LB3S6qbXxSNLip0oCNOxs=
Host: azureinsiders.blob.core.windows.net

Blocks might represent discrete segments of data that are organized like scenes in a movie, where you want to delete some scenes and change the order of others. The block blob API supports this kind of functionality. You can delete blocks just by excluding their BlockIDs from the BlockList body of your request, and you can reorder your blocks by changing their order in the list. This is shown in the following HTTP request, which deletes block 0 and saves block 2 before block 1. It may be a little hard to see this directly, because the BlockIDs are base64-encoded in the BlockList, but you can easily modify the sample code shown a little later to see how the body of the message is changed by the block order.

PUT http://azureinsiders.blob.core.windows.net/demo/MyBlockBlob.txt
    ?comp=blocklist&timeout=90 HTTP/1.1
x-ms-version: 2012-02-12
User-Agent: WA-Storage/2.0.0
x-ms-blob-content-type: application/octet-stream
Content-MD5: uO2OSdbs3agLOJthlv1b4w==
x-ms-date: Sun, 30 Dec 2012 22:03:32 GMT
Authorization: SharedKey azureinsiders:IcJMoA2vWMPVfWK0f2NzzpqwR5hi1JOuPB8poQef8D4=
Host: azureinsiders.blob.core.windows.net
Content-Length: 114
Connection: Keep-Alive

<?xml version="1.0" encoding="utf-8"?>
<BlockList>
    <Latest>MDAwMDI=</Latest>
    <Latest>MDAwMDE=</Latest>
</BlockList>

The following code snippet shows how you can delete blocks by excluding their BlockIDs when you call PutBlockList using the Windows Azure client library. In the following code snippet, you delete block 0 and your duplicate block 1, and save block 2 before block 1.

// You can change the block order & remove a block:
blockBlob.PutBlockList(new[] { 2.ToBlockId(), 1.ToBlockId() });
// Succeeds
try {
    blockBlob.DownloadText();
}
catch (StorageException ex) {
    Console.WriteLine(String.Format("Failure: Status={0}({0:D}), Msg={1}",
        (HttpStatusCode)ex.RequestInformation.HttpStatusCode,
         ex.RequestInformation.HttpStatusMessage));
}

After executing the preceding code, you can verify that block A was deleted and blob C appears before block B by executing an HTTP GET against the blob’s URI, as shown here.

GET http://azureinsiders.blob.core.windows.net/demo/MyBlockBlob.txt?timeout=90 HTTP/1.1
x-ms-version: 2012-02-12
User-Agent: WA-Storage/2.0.0
x-ms-date: Mon, 31 Dec 2012 07:04:54 GMT
Authorization: SharedKey azureinsiders:NhZ/aPp3HEtB6tMyT1NMj4BD4LRvySi8YV5/1BfQAwk=
Host: azureinsiders.blob.core.windows.net

The full blob as committed is returned as the response to this request. You can see the C B content in the body of the message response.

HTTP/1.1 200 OK
Content-Length: 4
Content-Type: application/octet-stream
Last-Modified: Mon, 31 Dec 2012 07:04:53 GMT
Accept-Ranges: bytes
ETag: "0x8CFB53C446E71CD"
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 3523b2a0-1069-4e01-8f7a-8210f4e60612
x-ms-version: 2012-02-12
x-ms-lease-status: unlocked
x-ms-lease-state: available
x-ms-blob-type: BlockBlob
Date: Mon, 31 Dec 2012 07:04:54 GMT

C B

When you upload a blob that is greater than 32 MB, the UploadXxx operations automatically break your upload up into 4-MB blocks, upload each block with PutBlock, and then commit all blocks with the PutBlockList method. The block size can be changed by modifying the WriteBlockSizeInBytes property of your client proxy (for example, your instance of CloudBlobClient), as shown in the following commented code block.

// The client library can automatically upload large blobs in blocks:
// 1. Define size of "large blob" (range=1MB-64MB, default=32MB)
client.SingleBlobUploadThresholdInBytes = 32 * 1024 * 1024;

// 2. Set individual block size (range=1MB-4MB, default=4MB)
client.WriteBlockSizeInBytes = 4 * 1024 * 1024;

    // 3. Set # of blocks to simultaneously upload (range=1-64, default=# CPUs)
    client.ParallelOperationThreadCount = Environment.ProcessorCount;