Developing Cloud Applications with Windows Azure Storage: Blobs

  • 3/15/2013
Paul Mehner offers a thorough understanding of how to securely perform data operations against the blob storage service.

In this chapter:

  • Blob basics

  • Blob containers

  • Blob addressing

  • Business use cases

  • Blob storage structure

  • Navigating blob container hierarchies

  • Storage Client library blob types

  • Container and blob naming rules

  • Performing create, read, update, and delete blob operations

  • Shared Access Signatures and shared access policies

  • Blob attributes and metadata

  • Conditional operations

  • Blob leases

  • Using block blobs

  • Using page blobs

  • Blob snapshots

  • Continuation tokens and blobs

  • Conclusion

In this chapter, you learn about Windows Azure blob storage. First you examine the characteristics of this kind of data storage, including the kinds of real-world data and storage scenarios that lend themselves well to blob storage. You then learn about the organizational structure of this storage type, including the naming conventions and other rules that must be followed. This chapter discusses how to perform common create, read, update, and delete (CRUD) operations on blobs and their containers. To deepen your understanding, you tackle the advanced and valuable but often overlooked features of blobs, such as metadata, snapshots, and granular security access, which allow CRUD operations to be performed only by authorized parties. Finally, you learn how to write applications for robustness and resiliency in the cloud.

Blob basics

BLOB is an acronym for Binary Large Object, but the uppercase convention is generally ignored in favor of the more colloquial lowercase blob, which I use throughout the book. A blob holds arbitrarily structured data, which the blob has no knowledge of. To the blob, the data it contains is just a bunch of random bytes that may be read or written to either sequentially or in randomly accessed chunks (called blocks, or pages). Although the data contained in a blob may have a structure and may even adhere to a schema, the blob itself, as just mentioned, has no knowledge of what this structure might be. Blobs are often used to store documents such as Microsoft Word, Microsoft Excel, and XML documents; pictures; audio clips; videos; and backups of file systems. Files that might be stored on your computer’s hard drive, or content that you might publish on a website, can alternatively be stored in blob storage.

In addition to the data contained within a blob, a blob also stores its own name, a small amount of metadata—8 kilobytes (KB) at the time of this writing—and an MD5 hash that can be used to validate a blob’s integrity.

The cloud fabric manages the dynamic scaling of your data to meet demand. If a particular set of blobs are receiving a high volume of traffic, the cloud fabric will move those blobs to their own storage node. In a more extreme circumstance, an individual blob could potentially be on its own storage node. An individual blob cannot float around on its own anywhere it pleases, however; it must be stored in a structure called a blob container, which you will learn about later in this chapter. Windows Azure storage provides two distinct types of blob: the block blob and the page blob. You’ll examine the block blob first.

Block blobs

Block blobs are useful in sequential access scenarios when storage and consumption of the data can begin at the first byte and end at the last. These blobs can be uploaded in equal-sized chunks referred to as blocks. This characteristic makes them well suited for applications requiring recovery from transmission failures, because transmission can be simply resumed from the last successfully transmitted block. Blocks in a blob may also be uploaded in parallel to increase throughput. An individual block blob can be any size up to 200 gigabytes (GB). When on-demand access to arbitrary locations within a blob is required, a better option may be the page blob, which is covered next.

Page blobs

Page blobs are useful when storage and consumption of the data may occur in any order. When on-demand access to arbitrary locations within a blob is required, the page blob is often the best option. An individual page blob can be any size up to 1 terabyte. Page blobs may also be sparsely populated, which is useful when implementing certain kinds of data structures and algorithms. Microsoft uses the sparsely populated page blob as the basis for drive storage, which is a virtual VHD—and for those paying careful attention, that would be a Virtually Virtual Hard Drive! Microsoft charges only for the pages that are occupied, so if you had a 1-terabyte blob with only 2 GB of population, you would pay only for the 2 GB of actual storage space used. This cost does not include fees for egress out-of-datacenter and transaction fees, which are not impacted by a page blob’s ability to be sparsely populated.