Azure App Service

Disaster recovery

In the event of a region-wide failure of the Azure infrastructure, causing App Service and all hosted apps to go offline, you would need to bring App Service back online in another available region and restore your data within that new region. You can achieve this in multiple ways, including the following:

  • Multi-site architecture You can set up App Service in such a way that the same apps are published across multiple Azure regions and the database is replicated geographically. Moreover, you can set up any interdependent components in a multi-site design or set them up to work with high availability (HA). This type of design is also called an active-active datacenter design.

  • Standby site You can preconfigure the standby region with all required apps and interconnected services and replicate required data to that site. The standby site can then be brought online in the event of a disaster. This type of design is also called an active-passive with hot standby design.

  • Cold recovery A cold recovery is where a failover site is identified, and all required services and data are restored and brought online on that site after a disaster, either manually or automatically. This type of design is also called an active-passive with cold standby design.

There are numerous factors to consider when developing a disaster-recovery strategy, especially for cold-recovery scenarios. Be sure to consider the following points in your planning:

  • Identify all interconnected components within the App Service and decide on the best strategy to make them either region-independent or restorable in another region when required. These should include (but are not limited to) the following:

    • Deployment slot configurations

    • TLS/SSL certificates

    • Azure Key Vault configuration, including secrets, certificates, and managed identities

    • Integrations with load balancers, Azure Traffic Manager, WAFs, or Azure Firewall

    • Integrations using Hybrid Connections, site-to-site VPNs, or ExpressRoute

    • Integrations with third-party services that must be rebuilt

  • Refer to the product documentation or configuration of each service to validate data replication or availability in the desired failover region in the event of a disaster.

  • Identify and document the steps to restore each service (and the configuration required to reintegrate it, if needed), and the order in which each service should be restored.

  • Test the restoration procedure, if possible, on a regular basis.

  • Identify the testing parameters after restoration to validate a successful restore.

  • Regularly review procedures and processes in place to ensure any changes in the environment, in product features, or in Microsoft’s terms of service are taken into account in the event a restoration operation becomes necessary.