Develop for Azure storage
- 8/5/2024
All applications work with information or data. Applications create, transform, model, or operate with that information. Regardless of the type or volume of data that your application uses, eventually you need to save it persistently so that it can be used later.
Storing data is not a simple task, and designing storage systems for that purpose is even more complicated. Perhaps your application must deal with terabytes or even petabytes of information, or you might work with an application that needs to be accessed from different countries, and you need to minimize the time required to access it. Cost efficiency is also a requirement in any project. In general, many requirements make designing and maintaining storage systems difficult.
Microsoft Azure offers different storage solutions in the cloud to satisfy your application storage requirements. Azure offers solutions for making your storage cost-effective and for minimizing latency.
Skills covered in this chapter:
Skill 2.1: Develop solutions that use Cosmos DB storage
Skill 2.2: Develop solutions that use Blob Storage
Skill 2.1: Develop solutions that use Cosmos DB storage
Cosmos DB is a premium storage service that Azure provides for satisfying your need for a globally distributed, low-latency, highly responsive, and always-online database service. Cosmos DB has been designed with scalability and throughput in mind. One of the most significant differences between Cosmos DB and other storage services offered by Azure is how easily you can scale your Cosmos DB solution across the globe by merely clicking a button and adding a new region to your database.
Another essential feature that you should consider when evaluating this type of storage service is how you can access this service from your code and how hard it would be to migrate your existing code to a Cosmos DB–based storage solution. The good news is that Cosmos DB offers different APIs for accessing the service. The best API for you depends on the type of data that you want to store in your Cosmos DB database. You store your data using Key-Value, Column-Family, Documents, or Graph approaches. Each of the different APIs that Cosmos DB offers allows you to store your data with different schemas. Currently, you can access Cosmos DB using SQL, Cassandra, Table, Gremlin, and MongoDB APIs.
Perform operations on containers and items by using the SDK
When working with Cosmos DB, you have several layers in the hierarchy of entities managed by the Cosmos DB account. The first layer is the Azure Cosmos DB account, where you choose the API you want to use to access your data. Remember that this API has implications for how the data is stored in the databases.
The second layer in the hierarchy is the database. You can create as many databases as you need in your Cosmos DB account. Databases are a way of grouping containers; you can think of databases like namespaces. At this level, you can configure the throughput associated with the containers included in the database.
When planning how to store the information that your application needs to work, you must consider the structure you need to use for storing that information. You may find that some parts of your application need to store information using a key-value structure. In contrast, others may need a more flexible, schema-less structure in which you save the information into documents. One fundamental characteristic of your application might be that you need to store the relationship between entities and use a graph structure for storing your data.
Cosmos DB offers a variety of APIs for storing and accessing your data, depending on the requirements of your application:
NoSQL This is the core and default API for accessing your data in your Cosmos DB account. This core API allows you to query JSON objects using SQL syntax, which means you don’t need to learn another query language. Under the hood, the SQL API uses the JavaScript programming model for expression evaluation, function invocations, and typing systems. You use this API when you need to use a data structure based on documents.
Table You can think of the Table API as the evolution of the Azure Table Storage service. This API benefits from the high-performance, low-latency, and high-scalability features of Cosmos DB. You can migrate from your current Azure Table Storage service with no code modification in your application. Another critical difference between Table API for Cosmos DB and Azure Table Storage is that you can define your own indexes in your tables. In the same way you do with the Table Storage service, the Table API allows you to store information in your Cosmos DB account using a data structure based on documents.
Cassandra Cosmos DB implements the wire protocol for the Apache Cassandra database into the options for storing and accessing data in the Cosmos DB database. This allows you to forget about operations and performance-management tasks related to managing Cassandra databases. In most situations, you can migrate your application from your current Cassandra database to Cosmos DB using the Cassandra API by merely changing the connection string. Azure Cosmos DB Cassandra API is compatible with the CQLv4 wire protocol. Cassandra is a column-based database that stores information using a key-value approach.
MongoDB You can access your Cosmos DB account by using the MongoDB API. This NoSQL database allows you to store the information for your application in a document-based structure. Cosmos DB implements the wire protocol compatible with MongoDB 3.2. This means that any MongoDB 3.2 client driver that implements and understands this protocol definition can connect seamlessly with your Cosmos DB database using the MongoDB API.
PostgreSQL This service is built on top of native PostgreSQL, which means you can use your code directly with Azure Cosmos DB for PostgreSQL without any substantial modification. This is a managed service, so Microsoft takes care of all the details regarding performance, availability, geo-replication, and all the features that the Cosmos DB service offers.
Gremlin Based on the Apache TinkerPop graph transversal language or Gremlin, this API allows you to store information in Cosmos DB using a graph structure. This means that instead of storing only entities, you store:
Vertices You can think of a vertex as an entity in other information structures. In a typical graph structure, a vertex could be a person, a device, or an event.
Edges These are the relationships between vertices. A person can know another person, a person might own a type of device, or a person might attend an event.
Properties These are attributes you can assign to a vertex or an edge.
Beware that you cannot mix these APIs in a single Cosmos DB account. You must define the API you want to use for accessing your Cosmos DB account when creating the account. Once you have created the account, you won’t be able to change the API to access it.
Azure offers SDKs for working with the different APIs you can use to connect to Cosmos DB. Supported languages are .NET, Java, Node.js, and Python. Depending on the API you want to use for working with Cosmos DB, you can also use other languages such as Spring Data, Spark V3, or Golang.
A container in an Azure Cosmos DB account is the unit of scalability for throughput and storage. When you create a new container, you must set the partition key to establish how the items that will be stored in the container are distributed across the different logical and physical partitions. The concept of a container maps to different elements depending on the API you choose:
NoSQL API Database.
Cassandra API Keyspace.
PostgreSQL API Database.
MongoDB API Database.
Gremlin API Database.
Table API This concept does not apply to Table API, although under the hood, when you create your first Table, Cosmos DB creates a default database for you.
Before going further with containers, we should review how data is stored in a container and how choosing the right partition key is crucial for performance.
When you save data to your Cosmos DB account, independently of the API you choose to use for accessing your data, Azure places the data in different servers to accommodate the performance and throughput you require from a premium storage service like Cosmos DB. The storage services use partitions to distribute the data. Cosmos DB slices your data into smaller pieces called partitions placed on the storage server. There are two different types of partitions when working with Cosmos DB:
Logical You can divide a Cosmos DB container into smaller pieces based on your criteria. Each of these smaller pieces is a logical partition. All items stored in a logical partition share the same partition key.
Physical These partitions are a group of replicas of your data that are physically stored on the servers. Azure automatically manages this group of replicas or replica sets. A physical partition can contain one or more logical partitions.
By default, any logical partition has a limit of 20 GB for storing data. This limit cannot be configured or modified. When configuring a new collection, you must decide whether you want your collection to be stored in a single logical partition and keep it under the limit of 20 GB or allow it to grow over that limit and span across multiple logical partitions. If you need your container to split over several partitions, Cosmos DB needs some way to know how to distribute your data across the different logical partitions. This is where the partition key comes into play. Keep in mind that this partition key is immutable, which means you cannot change the property that you want to use as the partition key once you have selected it.
Another important attribute that you must consider when choosing a partition key is that a logical partition is the limit of the scope for transactions. This means that all the operations inside the scope of the logical partition are executed using transactions with snapshot isolation. You don’t need to worry about creating or deleting logical partitions. The system automatically creates new logical partitions as needed and deletes any partition that becomes empty after deleting all data.
Choosing the correct partition key is critical for achieving the best performance. Choosing the proper partition key is so important because Azure creates a logical partition for each distinct value of your partition key. Listing 2-1 shows an example of a JSON document.
LISTING 2-1 Example JSON document
{ "id": "1", "firstName": "Santiago", "lastName": "Fernández", "city": "Sevilla", "country": "Spain" }
City or country properties would be the right choice for the partition key, depending on your data. You might find in your data that some documents have the same value for the country property, so they are stored together in the same logical partition. Using the id property as the partition key means that you end with a logical partition with a single document on each partition. This configuration can be beneficial when your application usually performs read workloads and uses parallelization techniques for getting the data.
On the other hand, if you select a partition key with just a few possible values, you can end up with “hot” partitions. A “hot” partition is a partition that receives most of the requests when working with your data. The main implication for these “hot” partitions is that they usually reach the throughput limit for the partition, which means you must provision more throughput. Another potential drawback is that you can reach the limit of 20 GB for a single logical partition. Because a logical partition is the scope for efficient multi-document transactions, selecting a partition key with a few possible values allows you to execute transactions on many documents inside the same partition.
Use the following guidelines when selecting your partition key:
The storage limit for a single logical partition is 20 GB. Select another partition key if you foresee that your data will require more space for each partition value. If your partition reaches the size limit of 20 GB, you must re-architect your solution and choose another partition key. In that situation, you can create a support ticket to request a temporary increase in the partition size. This is a temporary solution to help you re-architect your solution.
The requests to a single logical partition cannot exceed the throughput limit for that partition. If your requests reach that limit, they are throttled to avoid exceeding it. If you reach this limit frequently, you should select another partition key because there is a good chance that you have a “hot” partition. The minimum throughput limit is different from databases to containers. The database’s minimum throughput is 100 request units per second (RU/s). The minimum throughput for containers is 400 RU/s.
Choose partition keys with a wide range of values and access patterns that can evenly distribute requests across logical partitions. This allows you to achieve the right balance between executing cross-document transactions and scalability. Using timestamp-based partition keys is usually a lousy choice for a partition key.
Review your workload requirements. The partition key that you choose should allow your application to perform well on reading and writing workloads.
The parameters that you usually use on your requests and filtering queries are good candidates for a partition key.
There could be situations for which none of the properties of your items are appropriate for the partition keys. In those situations, you can create synthetic partition keys. A synthetic partition key is a key compound of two concatenated properties. In the previous document example shown in Listing 2-1, you created a new property named partitionKey containing a string that concatenates the values of city and country. For the example document, the value of the partitionKey should be Sevilla-Spain. The same rules you consider for a regular partition key apply to synthetic partition keys.
Once you have a clearer idea of how partitions work and the importance of choosing the right partition key, you can create containers. When you create a new container, you can decide whether the throughput for the container is one of the following two modes:
Dedicated All the throughput is provisioned for a container. In this mode, Azure makes a reservation of resources for the container that is backed by service-level agreements (SLAs).
Shared The throughput is shared between all the containers configured in the database, excluding those containers that have been configured as dedicated throughput mode. The shared throughput is configured at the database level.
You cannot switch to a different mode once you have created a container of one mode. If you need to change the mode assigned to a container, you must create a new container with the needed mode and copy all data from the old container to the new one.
Containers are schema-agnostic. This means you can store in the same container documents or entities with different properties as long as they share the same partition key. For example, you could store the information about a device in a container, and all the incidents and repair orders related to that device, also in the same container. The only limitation is that all these entities must share the same partition key.
When you create a Cosmos DB container, you can configure a set of properties. These properties affect different aspects of the container or how the items are stored or managed. The following list details the properties of a container that can be configured. Keep in mind that not all properties are available for all APIs:
ID This is the name of the container.
IndexingPolicy When you add an item to a container, all the item properties are automatically indexed by default. It doesn’t matter whether all the items in the collection share the same schema or each item has its own schema. This property allows you to configure how to index the items in the container. You can configure different types of indexes and include or exclude some properties from the indexes.
TimeToLive (TTL) You can configure your container to delete items after a period of time automatically. TimeToLive is expressed in seconds. For example, if you implement a cache system using Cosmos DB, this TTL could be the period of time the items are stored in the cache. Once the TTL is reached, the item is automatically deleted from your container. You can configure the TTL value at the container or item level. If you configure the TTL at the container level, all items in the container have the same TTL, except if you configure a TTL for a specific item. A value of -1 in the TTL means that the item does not expire. If you set a TTL value to an item where its container does not have a TTL value configured, then the TTL at item level has no effect.
ChangeFeedPolicy You can read the changes made to an item in a container. The change feed provides you with the original and modified values of an item. Because the changes are persisted, you can process the changes asynchronously. You can use this feature for triggering notifications or calling APIs when a new item is inserted or an existing item is modified.
UniqueKeyPolicy You can configure which item’s property is used as the unique key. Using unique keys, you ensure that you cannot insert two items with the same value for the same item. Keep in mind that the uniqueness is scoped to the logical partition. For example, if your item has the properties email, firstname, lastname, and company, and you define email as the unique key and company as the partition key, you cannot insert an item with the same email and company values. You can also create compound unique keys, such as email and firstname. Once you have created a unique key, you cannot change it. You can only define the unique key during the creation process of the container.
AnalyticalTimeToLive Sets the time that an item will be kept in an analytical store container. The item is deleted from the container after the time specified in this property is reached. An analytical store is a special type of column store that is schematized to optimize for analytical query performance.
Apart from the properties that you reviewed in the previous list, there is a group of properties that are automatically generated and managed by the system. You can read these system-generated properties, but you cannot modify them. These properties are: _rid, _etag, _ts, and _self.
Before starting with the examples, you must create a Cosmos DB account to store your data. The following procedure shows how to create a Cosmos DB free account with the SQL API. You can use this same procedure for creating accounts with the other APIs reviewed in this skill:
Sign in to the Azure portal (http://portal.azure.com).
In the top left of the Azure portal, click the menu icon represented by three horizontal bars, and then select Create A Resource.
On the Create A Resource panel, under the Categories column, select Databases. On the Popular Azure Services column, click the Create link under Azure Cosmos DB.
On the Create An Azure Cosmos DB Account blade, click the Create button in the Azure Cosmos DB For NoSQL section, as shown in Figure 2-1.
FIGURE 2-1 Selecting a Cosmos DB API
On the Create Azure Cosmos DB Account blade, in the Resource Group dropdown menu, click the Create New link below the dropdown menu. Type a name for the new Resource Group in the pop-up dialog box. Alternatively, you can select an existing Resource Group from the dropdown menu.
In the Instance Details section, type an Account Name.
On the Location dropdown menu, select the region most appropriate for you. If you are using App Services or virtual machines, select the region in which you deployed those services.
In the Capacity mode selection control, keep Provisioned throughput selected.
Ensure that the Apply Free Tier Discount switch is set to Apply.
Click the Next: Global Distribution button at the bottom of the blade.
Leave Geo-Redundancy, Multi-Region Write, and Availability Zones disabled.
Leave all other options in the other tabs with their default values.
In the bottom left of the Create An Azure Cosmos DB Account blade, click the Review + Create button.
In the bottom left of the Review + Create tab, click the Create button to start deploying your Cosmos DB account.
Once you have created an Azure Cosmos DB account, you can use the following procedure to create a new collection in your Cosmos DB account. This procedure might be slightly different depending on the API you choose for your Cosmos DB account. In this procedure, you use a Cosmos DB account configured with the NoSQL API:
Sign in to the Azure portal (http://portal.azure.com).
In the search box at the top of the Azure portal, type the name of your Cosmos DB account and then select your account name.
On your Cosmos DB account blade, select Data Explorer.
On the Data Explorer blade, click the New Container icon in the top left of the blade.
On the New Container panel, shown in Figure 2-2, provide a name for the new database. If you want to add a container to an existing database, you can select the database by clicking the Use Existing radio button.
Ensure that the Share Throughput Across Containers checkbox is selected. You are configuring this container as a shared throughput container using this option. If you want to create a dedicated throughput container, uncheck this option.
Leave the Database Throughput (Autoscale) value set to Autoscale. This is the value for the database throughput if the previous option is checked. Otherwise, this value represents the dedicated throughput reserved for the container.
Leave the Database Max RU/s value set to 1000. This is the maximum value of Request Units per second configured for your container. The capacity is scaled between the minimum 10 percent of the configured value and the maximum value. This option appears only if you select the Autoscale option for the Database Throughput setting.
In the Container ID text box, type a name for the container.
Keep the Indexing setting as Automatic.
Type a partition key in the Partition Key text box. The partition key must start with the slash character.
If you want to create a unique key for this container, click the Add Unique Key button.
Click the OK button at the bottom of the panel.
FIGURE 2-2 Creating a new collection
Estimated monthly cost (USD). This cost is an estimate and may vary based on the regions where your account is deployed and potential discounts applied to your account: $8.76 - $87.60 (1 region, 100 - 1000 RU/s, $0.00012/RU)” The Container ID setting shows a blank text box into which you can type a new Container ID. The Indexing option offers the options Automatic and Off. The Automatic option is selected. The last setting, Partition Key, does not show any options. An Add Unique Key button appears at the bottom of the dialog box.
Once you have configured your container, you can create items on it. As mentioned in this section, you can use different languages, such as .NET, Node.js, Java, Python, or Go.
The following example shows how to create a console application using .NET Core. The first example uses Cosmos DB SQL API for creating, updating, and deleting some elements in the Cosmos DB account:
Open Visual Studio Code and create a directory for storing the example project.
Open the Terminal, switch to the project’s directory, and type the following command:
dotnet new console
Install the NuGet package using the SQL API to interact with your Cosmos DB account. Type the following command in the Terminal:
dotnet add package Microsoft.Azure.Cosmos
Change the content of the Program.cs file using the content provided in Listing 2-2. You need to change the namespace according to your project’s name.
Sign in to the Azure portal (http://portal.azure.com).
In the search box at the top of the Azure portal, type the name of your Cosmos DB account and then click the name of the account.
On your Cosmos DB Account blade, in the Settings section, select Keys.
On the Keys panel, copy the URI and Primary Keys values from the Read-Write Keys tab. You need to provide these values to the EndpointUri and Key Constants in the code shown in Listing 2-2. (The most important parts of the code are shown with bold format.)
LISTING 2-2 Cosmos DB NoSQL API example
//C# .NET 6.0 LTS. Program.cs using System.Collections.Immutable; using System.Xml.Linq; using System.Diagnostics; using System.Runtime.CompilerServices; using System; using System.Linq; using Microsoft.Azure.Cosmos; using System.Threading.Tasks; using ch2_1_1_NoSQL.Model; using System.Net; namespace ch2_1_1_NoSQL { class Program { private const string EndpointUri = "<PUT YOUR ENDPOINT URL HERE>"; private const string Key = "<PUT YOUR COSMOS DB KEY HERE>"; private CosmosClient client; private Database database; private Container container; static void Main(string[] args) { try { Program demo = new Program(); demo.StartDemo().Wait(); } catch (CosmosException ce) { Exception baseException = ce.GetBaseException(); System.Console.WriteLine($"{ce.StatusCode} error ocurred: {ce.Message}, Message: {baseException.Message}"); } catch (Exception ex) { Exception baseException = ex.GetBaseException(); System.Console.WriteLine($"Error ocurred: {ex.Message}, Message: {baseException.Message}"); } } private async Task StartDemo() { Console.WriteLine("Starting Cosmos DB NoSQL API Demo!"); //Create a new demo database string databaseName = "demoDB_" + Guid.NewGuid().ToString(). Substring(0, 5); this.SendMessageToConsoleAndWait($"Creating database {databaseName}..."); this.client = new CosmosClient(EndpointUri, Key); this.database = await this.client.CreateDatabaseIfNotExistsAsync (databaseName); //Create a new demo collection inside the demo database. //This creates a collection with a reserved throughput. You can customize //the options using a ContainerProperties object //This operation has pricing implications. string containerName = "collection_" + Guid.NewGuid().ToString(). Substring(0, 5); this.SendMessageToConsoleAndWait($"Creating collection demo {containerName}..."); this.container = await this.database.CreateContainerIfNotExistsAsync (containerName, "/LastName"); //Create some documents in the collection Person person1 = new Person { Id = "Person.1", FirstName = "Santiago", LastName = "Fernandez", Devices = new Device[] { new Device { OperatingSystem = "iOS", CameraMegaPixels = 7, Ram = 16, Usage = "Personal"}, new Device { OperatingSystem = "Android", CameraMegaPixels = 12, Ram = 64, Usage = "Work"} }, Gender = "Male", Address = new Address { City = "Seville", Country = "Spain", PostalCode = "28973", Street = "Diagonal", State = "Andalucia" }, IsRegistered = true }; await this.CreateDocumentIfNotExistsAsync(databaseName, containerName, person1); Person person2 = new Person { Id = "Person.2", FirstName = "Agatha", LastName = "Smith", Devices = new Device[] { new Device { OperatingSystem = "iOS", CameraMegaPixels = 12, Ram = 32, Usage = "Work"}, new Device { OperatingSystem = "Windows", CameraMegaPixels = 12, Ram = 64, Usage = "Personal"} }, Gender = "Female", Address = new Address { City = "Laguna Beach", Country = "United States", PostalCode = "12345", Street = "Main", State = "CA" }, IsRegistered = true }; await this.CreateDocumentIfNotExistsAsync(databaseName, containerName, person2); //Make some queries to the collection this.SendMessageToConsoleAndWait($"Getting documents from the collection {containerName}..."); //Find documents using LINQ IQueryable<Person> queryablePeople = this.container.GetItemLinqQueryable <Person>(true) .Where(p => p.Gender == "Male"); System.Console.WriteLine("Running LINQ query for finding men..."); foreach (Person foundPerson in queryablePeople) { System.Console.WriteLine($"\tPerson: {foundPerson}"); } //Find documents using SQL var sqlQuery = "SELECT * FROM Person WHERE Person.Gender = 'Female'"; QueryDefinition queryDefinition = new QueryDefinition(sqlQuery); FeedIterator<Person> peopleResultSetIterator = this.container.GetItemQuery Iterator<Person>(queryDefinition); System.Console.WriteLine("Running SQL query for finding women..."); while (peopleResultSetIterator.HasMoreResults) { FeedResponse<Person> currentResultSet = await peopleResultSetIterator. ReadNextAsync(); foreach (Person foundPerson in currentResultSet) { System.Console.WriteLine($"\tPerson: {foundPerson}"); } } Console.WriteLine("Press any key to continue..."); Console.ReadKey(); //Update documents in a collection this.SendMessageToConsoleAndWait($"Updating documents in the collection {containerName}..."); person2.FirstName = "Mathew"; person2.Gender = "Male"; await this.container.UpsertItemAsync(person2); this.SendMessageToConsoleAndWait($"Document modified {person2}"); //Delete a single document from the collection this.SendMessageToConsoleAndWait($"Deleting documents from the collection {containerName}..."); PartitionKey partitionKey = new PartitionKey(person1.LastName); await this.container.DeleteItemAsync<Person>(person1.Id, partitionKey); this.SendMessageToConsoleAndWait($"Document deleted {person1}"); //Delete created demo database and all its children elements this.SendMessageToConsoleAndWait("Cleaning-up your Cosmos DB account..."); await this.database.DeleteAsync(); } private void SendMessageToConsoleAndWait(string message) { Console.WriteLine(message); Console.WriteLine("Press any key to continue..."); Console.ReadKey(); } private async Task CreateDocumentIfNotExistsAsync(string database, string collection, Person person) { try { await this?.container.ReadItemAsync<Person>(person.Id, new PartitionKey(person.LastName)); this.SendMessageToConsoleAndWait($"Document {person.Id} already exists in collection {collection}"); } catch (CosmosException dce) { if (dce.StatusCode == HttpStatusCode.NotFound) { await this?.container.CreateItemAsync<Person>(person, new PartitionKey(person.LastName)); this.SendMessageToConsoleAndWait($"Created new document {person.Id} in collection {collection}"); } } } } }
When you work with the SQL API, the Azure Cosmos DB SDK provides you with the appropriate classes for working with the different elements of the account. In the Listing 2-2 example, you must create a CosmosClient object before accessing your Azure Cosmos DB account. The Azure Cosmos DB SDK also provides the classes Database and Container for working with these elements. When you need to create a Database or a Container, you can use CreateDatabaseIfNotExistsAsync or CreateContainerIfNotExistsAsync. These IfNotExists methods automatically check to determine whether the Container or Database exists in your Cosmos DB account; if they don’t exist, the method automatically creates the Container or the Database. When you create a new container in your database, notice that in this example, you have provided the PartitionKey using the appropriate constructor overload.
However, when you need to create a new document in the database, you don’t have this type of IfNotExists method available. In this situation, you have two options:
Use the method UpsertItemAsync, which creates a new document if the document doesn’t exist or updates an existing document.
Implement your own version of the IfNotExists method, so you need to check whether the document already exists in the container. If the document doesn’t exist, then you create the actual document, as shown in the following fragment from Listing 2-2. (The code in bold shows the methods that you need to use for creating a document.)
try { await this?.container.ReadItemAsync<Person> (person.Id, new PartitionKey (person.LastName)); this.SendMessageToConsoleAndWait($"Document {person.Id} already exists in collection {collection}"); } catch (CosmosException dce) { if (dce.StatusCode == HttpStatusCode.NotFound) { await this?.container.CreateItemAsync<Person>(person, new PartitionKey(person.LastName)); this.SendMessageToConsoleAndWait($"Created new document {person.Id} in collection {collection}"); } }
When you create the document using the CreateItemAsync method, notice that you can provide the value for the partition key by using the following code snippet new PartitionKey(person.LastName). If you don’t provide the value for the partition key, the correct value is inferred from the document that you are trying to insert into the database.
You need to do this verification because you get a CosmosException with StatusCode 409 (Conflict) if you try to create a document with the same Id as an already existing document in the collection. Similarly, you get a CosmosException with StatusCode 404 (Not Found) if you try to delete a document that doesn’t exist in the container using the DeleteItemAsync method or if you try to replace a document that doesn’t exist in the container using the ReplaceItemAsync method. Notice that these two methods also accept a partition key parameter.
When you create a document, you need to provide an Id property of type string to your document. This property needs to identify your document inside the collection uniquely. If you don’t provide this property, Cosmos DB automatically adds it to the document for you, using a GUID string.
As you can see in the example code in Listing 2-2, you can query your documents using LINQ or SQL sentences. In this example, I have used a simple SQL query for getting documents that represent a person with the male gender. However, you can construct more complex sentences such as a query that returns all people who live in a specific country, using the WHERE Address.Country = ‘Spain’ expression, or people that have an Android device by using the WHERE ARRAY_CONTAINS(Person.Devices, { ‘OperatingSystem’: ‘Android’}, true) expression.
Once you have modified the Program.cs file, you need to create some additional classes that you use in the main program for managing documents. You can find these new classes in Listings 2-3 to 2-5.
In the Visual Studio Code window, create a new folder named Model in the project folder.
Create a new C# class file in the Model folder and name it Person.cs.
Replace the content of the Person.cs file with the content of Listing 2-3. Change the namespace as needed for your project.
Create a new C# class file in the Model folder and name it Device.cs.
Replace the content of the Device.cs file with the content of Listing 2-4. Change the namespace as needed for your project.
Create a new C# class file in the Model folder and name it Address.cs.
Replace the content of the Address.cs file with the content of Listing 2-5. Change the namespace as needed for your project.
At this point, you can run the project by pressing F5 in the Visual Studio Code window. Check to see how your code is creating and modifying the different databases, document collections, and documents in your Cosmos DB account. You can review the changes in your Cosmos DB account using the Data Explorer tool in your Cosmos DB account in the Azure portal.
LISTING 2-3 Cosmos DB NoSQL API example: Person.cs
//C# .NET 6.0 LTS. using Newtonsoft.Json; namespace ch2_1_1_NoSQL.Model { public class Person { [JsonProperty(PropertyName="id")] public string Id { get; set; } public string FirstName { get; set; } public string LastName { get; set; } public Device[] Devices { get; set; } public Address Address { get; set; } public string Gender { get; set; } public bool IsRegistered { get; set; } public override string ToString() { return JsonConvert.SerializeObject(this); } } }
LISTING 2-4 Cosmos DB NoSQL API example: Device.cs
//C# .NET 6.0 LTS. namespace ch2_1_1_NoSQL.Model { public class Device { public int Ram { get; set; } public string OperatingSystem { get; set; } public int CameraMegaPixels { get; set; } public string Usage { get; set; } } }
LISTING 2-5 Cosmos DB NoSQL API example: Address.cs
//C# .NET 6.0 LTS. namespace ch2_1_1_NoSQL.Model { public class Address { public string City { get; set; } public string State { get; set; } public string PostalCode { get; set; } public string Country { get; set; } public string Street { get; set; } } }
At this point, you can press F5 in your Visual Studio Code window to execute the code. The code stops on each step for you to view the operation’s result directly on the Azure portal. Use the following steps to view the modifications in your Cosmos DB account:
Sign in to the Azure portal (http://portal.azure.com).
In the search box at the top of the Azure portal, type the name of your Cosmos DB account and then click the account name.
On your Cosmos DB Account blade, select Data Explorer.
On the Data Explorer blade, on the left side of the panel, under the label SQL API, you should be able to see the list of databases created in your Cosmos DB account.
Set the appropriate consistency level for operations
One of the main benefits that Cosmos DB offers is the ability to have your data distributed globally with low latency when accessing the data. This means that you can configure Cosmos DB for replicating your data between any of the available Azure regions while achieving minimal latency when your application accesses the data from the nearest region. If you need to replicate your data to an additional region, you only need to add to the list of regions where your data should be available.
This replication across the different regions has a drawback: the consistency of your data. To avoid corruption, your data must be consistent among all copies of your database. Fortunately, the Cosmos DB protocol offers five levels of consistency replication. Going from consistency to performance, you can select how the replication protocol behaves when copying your data between all the replicas that are configured across the globe. These consistency levels are region agnostic, which means the region that started the read or write operation or the number of regions associated with your Cosmos DB account doesn’t matter, even if you configured a single region for your account. You configure this consistency level at the Cosmos DB level, and it applies to all databases, collections, and documents stored inside the same account. You can choose from the consistency levels shown in Figure 2-3. Use the following procedure to select the consistency level:
Sign in to the Azure portal (http://portal.azure.com).
In the search box at the top of the Azure portal, type the name of your Cosmos DB account and then click the account name.
Select Default Consistency in the Settings section on your Cosmos DB account blade.
On the Default Consistency blade, select the desired consistency level. Your choices are Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual.
Click the Save icon at the top left of the Default Consistency blade.
FIGURE 2-3 Selecting the consistency level
When configuring the consistency level, you must choose one of the following five options:
Strong The read operations are guaranteed to return the most recently committed version of an element; that is, the user always reads the latest committed write. This consistency level is the only one that offers a linearizability guarantee. This guarantee comes at a price. It has higher latency because of the time needed to write operation confirmations, and the availability can be affected during failures.
Bounded Staleness The reads are guaranteed to be consistent within a preconfigured lag. This lag can consist of a number of the most recent (K) versions or a time interval (T). This means that if you make write operations, the read of those operations happens in the same order but with a maximum delay of K versions of the written data or T seconds since you wrote the data in the database. For reading operations that happen within a region that accepts writes, the consistency level is identical to the Strong consistency level. This level is also known as “time-delayed linearizability guarantee.”
Session Scoped to a client session, this consistency level offers the best balance between a strong consistency level and the performance provided by the eventual consistency level. It best fits applications in which write operations occur in the context of a user session.
Consistent Prefix This level guarantees that you always read data in the same order that you wrote the data, but there’s no guarantee that you can read all the data. This means that if you write “A, B, C” you can read “A”, “A, B”, or “A, B, C” but never “A, C” or “B, A, C.”
Eventual There is no guarantee for the order in which you read the data. In the absence of a write operation, the replicas eventually converge. This consistency level offers better performance at the cost of the complexity of the programming. Use this consistency level if the order of the data is not essential for your application.
The best consistency level choice depends on your application and the API you want to store data. As you can see in the different consistency levels, your application’s requirements regarding data read consistency versus availability, latency, and throughput are critical factors you must consider when selecting.
You should consider the following points when you use NoSQL or Table API for your Cosmos DB account:
The recommended option for most applications is the level of session consistency.
If you are considering the strong consistency level, we recommend that you use the bonded staleness consistency level because it provides a linearizability guarantee with a configurable delay.
If you are considering the eventual consistency level, we recommend that you use the consistent prefix consistency level because it provides comparable levels of availability and latency with the advantage of guaranteed read orders.
Carefully evaluate the strong and eventual consistency levels because they are the most extreme options. In most situations, other consistency levels can provide a better balance between performance, latency, and data consistency.
When you use Cassandra or MongoDB APIs, Cosmos DB maps the consistency levels offered by Cassandra and MongoDB to the consistency level offered by Cosmos DB. The reason for doing this is because when you use these APIs, neither Cassandra nor MongoDB offers a well-defined consistency level. Instead, Cassandra provides write or read consistency levels that map to the Cosmos DB consistency level in the following ways:
Cassandra write consistency level This level maps to the default Cosmos DB account consistency level.
Cassandra read consistency level Cosmos DB dynamically maps the consistency level specified by the Cassandra driver client to one of the Cosmos DB consistency levels.
On the other hand, MongoDB allows you to configure the following consistency levels: Write Concern, Read Concern, and Master Region. Similar to the mapping of Cassandra consistency levels, Cosmos DB consistency levels map to MongoDB consistency levels in the following ways:
MongoDB write concern consistency level This level maps to the default Cosmos DB account consistency level.
MongoDB read concern consistency level Cosmos DB dynamically maps the consistency level specified by the MongoDB driver client to one of the Cosmos DB consistency levels.
Configuring a master region You can configure a region as the MongoDB “master” by configuring the region as the first writable region.