Security patterns

Availability pattern

Availability is often considered a given. You expect your solution to be there when needed and provide its services to every user. Unfortunately, it is not so automatic to achieve this, and you have to work hard to guarantee that your solution is up and running.

In the cloud, the situation is even worse, if possible. When you develop a complex system integrating many services, you rely on all of them to be available. The unavailability of any service may cause your solution to be partially or entirely unavailable. When each of these systems is managed at least in part by a third party, you lack control over the maintenance activities. Therefore, you have to design your solution to be more resilient than you used to do with on-prem solutions.

Let’s see what you can do to improve the availability of your solution.

Design for denial of service

Intent and motivation

Denial-of-service (DoS) attacks are common occurrences. They happen when someone creates the conditions for your solution to fail by bombarding it with more requests than it can handle or by sending artfully crafted messages causing your solution to crash. In any case, these attacks cause the unavailability of your service. A variant of DoS attacks is called distributed DoS (DDoS). DDoS attacks are characterized by the generation of the attack from multiple points, sometimes in the order of the thousands or tens of thousands.

DoS and DDoS are some of the easiest attacks to execute. Some organizations even provide DDoS attacks as a service. You tell them who the target is, you pay them, and they do the rest! Very convenient and powerful.

You can address this problem in various ways, but the easiest is to simply add more resources to your application. One of the characteristics of the cloud is elasticity, which means you can allocate resources dynamically, as you need them. However, although this is easy and fast, it can be expensive and unfeasible in the long run. Here is where this pattern becomes useful.

Description

All public cloud platforms, including Azure, offer a base level of protection from DoS attacks. Azure also offers Azure DDoS Protection Standard (https://azsec.tech/s1h), designed to protect public IP addresses from potentially massive DDoS attacks.

The documentation for this service uses some specific wording that you should be aware of:

“Azure DDoS Protection Standard, combined with application design best practices, provides enhanced DDoS mitigation.”

Note that it says enhanced, not complete. In other words, as with any other anti-DDoS system, Azure cannot represent your only line of defense against DDoS attacks. These systems complement a more comprehensive strategy, which starts with the design of your solution. For this reason, you should not design a system without thinking about how your architecture might respond to a DDoS attack.

Consider this real-life example: a customer with nearly 100 public IP addresses protected by the Azure DDoS Protection Standard service suffered severe service degradation. This was due to a DDoS attack, but it went undetected because the attack was “low and slow.” That is, it fell below the triggering threshold for Azure DDoS Protection Standard, so none of the service’s mitigation policies were triggered. As a result, the combined traffic flow across all IP addresses overloaded the back-end NVAs, which forced them to drop part of the traffic. The security anti-pattern in this design takes data from multiple untrusted sources and concentrates that traffic on a single, internal endpoint. In this case, all that data was concentrated on a single VM running the NVA, which is where the impact of the attack became evident. The customer quickly applied the NVA’s built-in auto-scaling capabilities to bring more network bandwidth and compute online. In this scenario, a range of one to three NVAs mitigated the attack.

One final note: many Azure services can throttle requests. For example, Azure Key Vault allows 4,000 secret transactions (for example, reading an SQL connection string) per 10 seconds. If your code goes beyond this threshold, further requests are throttled and return a 429 (“Too Many Requests”) response. To remedy this, cache data if possible. In the Key Vault example, you could cache the connection string in memory for 15 minutes and read from Key Vault only four times per hour, which is way under the threshold. You can find a list of Azure subscription and service limits, quotas, and constraints at https://azsec.tech/9it.

Examples

  • Many services can be configured with networking rules or private endpoint connections. One such service is Azure Storage. If you opt to adopt the networking rules, the service itself is still exposed and can receive requests; you just have a high-speed mechanism to reject the requests because they are not from an acceptable IP address. While fast, this mechanism can still be overwhelmed or bypassed—for example, by an IP spoofing attack. Therefore, it is typically better to use private endpoints.

  • Consider the availability requirements of your solution. If you have strict availability requirements that do not allow for partial unavailability, it’s best to design the solution accordingly. For example, you could use a content delivery network (CDN) to serve the static content and a distributed and redundant architecture to provide the service to the relevant geographies.

Related security principles

  • Attack surface reduction

  • Defense in depth

  • Single point of failure

  • Weakest link

Related patterns

  • Isolate from the internet

  • Isolate with an identity perimeter