Developing Cloud Applications with Windows Azure Storage: Blobs

  • 3/15/2013

Continuation tokens and blobs

A pagination mechanism for data that can be presented in a tabular format is often required because humans generally cannot digest more than a page of information at one time. Also, you want to ensure that the data being requested is appropriate for the query and not a mistake, because you are dealing with potentially massive databases in the cloud. It can take a lot of resources to compute and transmit billions of rows of data over the wire. Windows Azure must also take into account its own scalability; a large query can block or significantly impede many smaller queries, because those smaller queries may have to wait or compete for resources. Continuation tokens, as introduced in Chapter 4, allow Windows Azure storage to return a smaller subset of your data. A continuation token imposes upon you, however, to ask for subsequent pages of your data by passing you back a continuation token that you must then use to call back to retrieve subsequent pages of your query. Windows Azure storage refers to these pages of data as segments.

Blobs are not tabular data, so you do not have to anticipate continuation tokens when retrieving blobs; however, some of the API does return tabular data such as retrieving a list of blobs stored in a container. Because a container can have an unlimited number of entries, you should always anticipate that you may receive a continuation token back from any request that you make by calling an xxx-Segmented retrieval method.

Console.Clear();
CloudBlobClient client = account.CreateCloudBlobClient();

// ListContainers/BlobsSegmented return up to 5,000 items
const Int32 desiredItems = 5000 + 1000;  // Try to read more than the max

// Create a bunch of blobs (this takes a very long time):
CloudBlobContainer container = client.GetContainerReference("manyblobs");
container.EnsureExists();
for (Int32 i = 0; i < desiredItems; i++)
    container.GetBlockBlobReference(String.Format("{0:00000}", i)).UploadText("");

// Enumerate containers & blobs in segments
for (ContainerResultSegment crs = null; crs.HasMore(); ) {
    crs = client.ListContainersSegmented(
      null, ContainerListingDetails.None, desiredItems, crs.SafeContinuationToken());
    foreach (var c in crs.Results) {
        Console.WriteLine("Container: " + c.Uri);
        for (BlobResultSegment brs = null; brs.HasMore(); ) {
            brs = c.ListBlobsSegmented(null, true, BlobListingDetails.None,
                desiredItems, brs.SafeContinuationToken(), null, null);
            foreach (var b in brs.Results) Console.WriteLine("   Blob: " + b.Uri);
        }
    }
}

In the preceding code, the value of the ContinuationToken property of the ResultSegment will be either a continuation token or null when there are no more results to read. The following extension method, which is included in the Wintellect Azure Power Library, was used to simplify the programming model. It does this by supplying a Boolean indicator that controls exiting the for-loop when no more result segments are available.

public static BlobContinuationToken SafeContinuationToken(
          this ContainerResultSegment crs) {
   return (crs == null) ? null : crs.ContinuationToken;
}Blob request options

As used in the preceding code examples, instances of BlobRequestOptions may be passed as arguments to blob operations to augment the behavior of a request. The BlobRequestOptions class is an assortment of loosely related properties that are applicable during different kinds of operations. Only Timeout and RetryPolicy are used by all data access storage requests. Table 5-6 describes the properties of BlobRequestOptions.

Table 5-6 BlobRequestOptions properties and their impact on requests

BlobRequestOptions properties

Impact on requests

Timeout

Used for all blob operations. The timespan allowed for a given operation to complete before a timeout error condition, which is handled according to the RetryPolicy property. See Chapter 4 for more information.

RetryPolicy

Used for all blob operations. Controls the retry policy used when accessing data. See Chapter 4 for more information.

AccessCondition

Used only when performing conditional operations. Controls the conditions for selecting data based on an ETag value (for example, If-Match, If-Non-Match, If-Modified-Since, and If-NotModified-Since).

CopySourceAccessCondition

Used only when performing conditional copy operations on blobs. Controls the conditions for selecting data based on an ETag value (for example, If-Match, If-Non-Match, If-Modified-Since, and If-NotModified-Since).

DeleteSnapshotsOption

Used only when performing delete operations on blobs. IncludeSnapshots deletes the blob and all of its snapshots; DeleteSnapshotsOnly deletes the snapshots only (leaving the blob); None.

BlobListingDetails

Used only when performing blob list operations. Controls the data that is included in the list. Options include the following:

All lists all available committed blobs, uncommitted blobs, and snapshots, and returns all metadata for those blobs.

  • Metadata retrieves blob metadata for each blob returned in the listing.

  • None lists only committed blobs and does not return blob metadata.

  • Snapshots lists committed blobs and blob snapshots.

  • UncommittedBlobs lists committed and uncommitted blobs.

UseFlatBlobListing

Used only when performing blob list operations.