Design and Implement an Azure Storage Strategy

  • 3/11/2015


This section contains the solutions to the thought experiments and answers to the objective review questions in this chapter.

Objective 4.1: Thought experiment

  1. You would consider structuring the blob hierarchy so that one of the portions of the path represented the language or region.
  2. You would consider creating a CDN on a publicly available container to cache those files locally around the world.

Objective 4.1: Objective review

  1. Correct answers: A and D

    1. Correct: Only blobs have writable system properties.
    2. Incorrect: Blob user-defined metadata is accessed as a key value pair.
    3. Incorrect: System metadata can influence how the blob is stored and accessed in Azure Storage.
    4. Correct: Containers also have system properties and user-defined metadata.
  2. Correct answers: B and D

    1. Incorrect: Page files are not faster for streaming files, but are very good for random I/O files like VHDs.
    2. Correct: Block blobs allow files to be uploaded and assembled later. Blocks can be resubmitted individually.
    3. Incorrect: Page blobs are for hard disks, not files for streaming.
    4. Correct: Block blobs have a maximum size of 200 GB. Page blobs can be 1 terabyte.
  3. Correct answers: A, B, and C

    1. Correct: SSL encrypts all data between client and server and prevents network sniffing.
    2. Correct: If the keys are hidden, they can’t be compromised and used to gain access to Table storage.
    3. Correct: Client-side code can easily be seen in the browser. Keep sensitive information stored where few people can access it.
    4. Incorrect: Public containers are not secured.

Objective 4.2: Thought experiment

  • Machine ID seems like a logical candidate for PartitionKey.
  • Shot count time stamp, ordered descending.
  • There might be two tables, one for the machine metadata and one for the shots. You could also make an argument for consolidating both pieces of data into one table for speed in querying.

Objective 4.2: Objective review

  1. Correct answer: A

    1. Correct: Transactional replication is used in Microsoft SQL Server. Table storage doesn’t have anything like that.
    2. Incorrect: Zone redundant storage is valid.
    3. Incorrect: Read access geo-redundant storage is valid.
    4. Incorrect: Geo-redundant storage is valid.
  2. Correct answers: C and D

    1. Incorrect: They should not necessarily be unique, although they can be for rare-use cases.
    2. Incorrect: You should only use the same partition key if you have a very small entity set.
    3. Correct: Batches can only have operations that exist in the same partition, with the same partition key.
    4. Correct: Even partition sizes will give your application predictable performance because one partition server won’t be unevenly loaded with more entities than the others.
  3. Correct answers: A, B, and C

    1. Correct: All operations have to be in the same partition.
    2. Correct: Total batch size can’t be greater than 4 MB.
    3. Correct: Maximum operation count is 100.
    4. Incorrect: There is no minimum operation count for a batch.

Objective 4.3: Thought experiment

  1. Typically, the application will store any relevant data and content to be used to produce reports in durable storage. When a report is requested, the request is written to a queue to trigger processing. The queue message must include enough information so that the compute instance listening to the queue can gather the information required for the specific user’s report. In some cases, the message may point to a report request stored in a database record. In other cases, the queue message holds enough data to look up all the information required for the report. The work to generate a PDF should be performed on a compute instance that does not compete with mainline application resources, such as the core application web applications and services. This will allow the system to scale PDF generation separately from the main application as needed.
  2. The most likely candidate for the compute instance for PDF generation is a cloud service worker role because it provides a built-in mechanism to deploy a Windows Service equivalent in a PaaS environment, but also provides some level of VM customization with startup tasks—possibly necessary for whatever PDF generation tool you may select. If no special software requirements are necessary for producing PDFs, you could also use a WebJob trigger to process queued messages. A VM can also be used, likely with a Windows Service deployed that processes the queue.
  3. It will be important to take note of the average memory and disk space used while processing a single message to generate the PDF report. If you monitor the compute instance statistics and slowly begin to scale the number of concurrent messages processed on a single instance, you’ll be able to see how much a single instance can handle for configuring auto-scale properties.
  4. When you have an idea of the number of concurrent messages that can be processed on a single compute instance, you can identify the number of queued items that should trigger scaling the number of instances. For VMs and cloud services, you can automate this with auto-scale by metrics. For websites and WebJobs, you do not have the option to auto-scale by metric.

Objective 4.3: Objective review

  1. Correct answer: B

    1. Incorrect: Storage queue messages have a size limit of 64 KB. It is true, however, that a smaller message size can increase throughput since the storage service can support more requests per second when those requests hold a smaller amount of data.
    2. Correct: Storage queues can only store up to 64 KB per message.
    3. Incorrect: Storage queue messages expire after seven days, unlike Service Bus Queue messages, which are persisted until explicitly read and removed.
    4. Incorrect: The message identifier should be considered opaque to the client, although it is returned from the AddMessage() method. When retrieving messages from the queue for processing, the message identifier is provided so that you can use it to subsequently delete the message.
  2. Correct answers: C and D

    1. Incorrect: A single compute instance can process as many messages as its resources allow for. For example, if processing a message is memory intensive, the number of parallel messages that can be processed will depend on the amount of memory to be consumed for each message that is processed.
    2. Incorrect: A single compute instance can process as many messages as its resources allow for. For example, if processing a message is memory intensive, the number of parallel messages that can be processed will depend on the amount of memory to be consumed for each message that is processed.
    3. Correct: The queue client can request up to 32 messages in a single batch and then process them sequentially or in parallel. Each request from the queue client can request another 32 messages.
    4. Correct: The queue client can request a single message or request up to 32 messages in a batch for processing.
    5. Incorrect: Messages are not deleted when the message is read. Messages must be explicitly deleted.
  3. Correct answers: A, C, and D

    1. Correct: By creating multiple queues for the application, broken down by logical heuristics that make sense to the application for distributing messages, you can increase scalability by reducing the pressure on a single queue for message processing and throughput.
    2. Incorrect: Websites do not support auto-scale by metric at this time.
    3. Correct: VMs can be scaled based on the number of items in a specified queue.
    4. Correct: Cloud services can be scaled based on the number of items in a specified queue.

Objective 4.4: Thought experiment

  1. Users who authenticate to the application should be able to request the report, but since the reports are stored in blobs, it is convenient to be able to share the link directly to the blob. You could have the web application present a page that generates an SAS URI for a report on demand. The user could then copy that link and share in email with others even if they don’t have access to the web application.
  2. The duration that the link should be valid depends on the typical workflow for your customers who access these reports. For example, if it is acceptable to expect the user who authenticated to download the report right away, or to send the link to someone who will do so right away, limit the SAS token to 30 minutes so that if the email with the link is found at a later time by an unauthorized user, it will be expired. If the link should be shared with someone who may need more time to access the report, but you want to enforce that links can be revoked when some other action has taken place in the application, use a stored access policy with an initial duration that will be acceptable for this workflow. You can then allow users to extend the validity of the SAS token through the web application, or you can programmatically revoke access if you note suspicious activity on the report links through storage logs.

Objective 4.4: Objective review

  1. Correct answers: A, C, D, and E

    1. Correct: You can generate an SAS token that grants read access to blobs. You can also grant access to modify a blob’s contents.
    2. Incorrect: You cannot grant access to create new containers using SAS. This operation requires the storage access key.
    3. Correct: You can grant access to an existing queue and allow add, update, and delete operations using SAS tokens.
    4. Correct: You can grant access to an existing table and allow add, update, and delete operations on entities within that table.
    5. Correct: You can grant access to query the entities of an existing table using SAS tokens.
  2. Correct answers: A, B, and C

    1. Correct: You can change both the start and expiration dates of an SAS token that is attached to a stored access policy.
    2. Correct: You can revoke all access by an SAS token that is attached to a stored access policy.
    3. Correct: You can revoke specific operations by an SAS token that is attached to a stored access policy. For example, you can remove support for delete operations that were originally granted.
    4. Incorrect: You can use the same stored access policy for multiple resources (such as multiple blobs, for example) but this is done at the time of producing the SAS token and associating the stored access policy to the token. You cannot add resources at the policy level.
  3. Correct answers: B and D

    1. Incorrect: CORS is not generally recommended but is a necessary evil for certain types of browser applications to allow for efficient access to storage resources. Try to avoid the use of CORS by using an alternate design if possible.
    2. Correct: If blobs are protected resources that require authentication, you should avoid using the storage account key to access them, in particular if this means sharing it with a browser. Instead, generate an SAS token that will be included in any links for requesting the resource and limit the duration the token is valid either with a short duration or a stored access policy you can revoke when the user session ends.
    3. Incorrect: CORS is now supported for all storage services, including blobs, queues, and tables.
    4. Correct: CORS is not enabled for a new storage account. You have to explicitly enable this feature.

Objective 4.5: Thought experiment

  1. You should be looking at using monitoring (through the management portal) and configuring alerts based on latency and availability.
  2. You should enable and review the Storage Analytics logs. You can look for usage patterns based on the type of activity seen or errors logged. You can also look for specific types of logs related to a specific event that occurred.

Objective 4.5: Objective review

  1. Correct answer: A

    1. Correct: Capacity metrics include total storage in bytes, the container count, and the object count for blob storage only.
    2. Incorrect: You can only set minute metrics programmatically or by using Windows PowerShell cmdlets.
    3. Incorrect: By default, retention is not specified, therefore metrics are retained indefinitely. You should set the retention policy to match your compliance requirements and seek to archive if beyond one year.
    4. Incorrect: If you disable metrics, all metrics previously collected will be retained until the retention period expires.
  2. Correct answers: B and D

    1. Incorrect: Logs are stored in a $logs container in Blob storage for your storage account, but the log capacity is not included in your storage account quota. A separate 20-terabyte allocation is made for storage logs.
    2. Correct: Logs can have duplicate entries within a one-hour period; however, you can identify a log entry uniquely with a RequestId and operation number.
    3. Incorrect: The log container cannot be deleted once in use, but the logs within that container can be deleted by authorized callers.
    4. Correct: You can log all or individual operations to all storage services.
  3. Correct answers: C and D

    1. Incorrect: A log entry is created for all successful authenticated and anonymous requests.
    2. Incorrect: For authenticated calls, all known failed requests are logged, and for anonymous calls, only failed Get requests for error code 304 are logged.
    3. Correct: Server errors generate a log entry.
    4. Correct: All requests to storage resources using SAS tokens are logged.

Objective 4.6: Thought experiment

  1. You might consider giving each developer his or her own copy of the database with SQL Database. Then create a central one for merging changes.
  2. If developers write database objects and then don’t access them again, you might need more than a 14-day backup retention policy. This might lead to a higher edition of SQL Database being used for reasons different than raw performance. You might also consider manually exporting the database if a developer says he or she will be doing something particularly risky.
  3. If the developers don’t need access to a database the same size as production, they might get away with the basic level of SQL Database. If they do need development databases that are just like production, then choose the level of SQL Database that corresponds with the right size. Developers don’t usually put a high load on their servers, so you can ignore the hardware metrics when selecting the appropriate level.

Objective 4.6: Objective review

  1. Correct answer: D

    1. Incorrect: The secondary database must have the same name as the primary.
    2. Incorrect: They must be on separate servers.
    3. Incorrect: They need to be on the same subscription
    4. Correct: The secondary server cannot be a lower performance tier than the primary.
  2. Correct answers: B, C, and D

    1. Incorrect: CPU Processor Count is not a valid metric
    2. Correct: CPU Percentage is a valid metric.
    3. Correct: Physical Data Reads Percentage is a valid metric
    4. Correct: Log Writes Percentage is a valid metric.
  3. Correct answers: A, B, and C

    1. Correct: Connection resiliency, because you could failover to a replica.
    2. Correct: Transaction resiliency so you can resubmit a transaction in the event of a failover.
    3. Correct: Query auditing so you can baseline your current query times and know when to scale up the instance.
    4. Incorrect: You can handle backup and restore operations from the Azure management portal. There’s no reason to write custom code for this.