Azure Application Gateway

Best practices

Following are some recommended practices for deploying and managing your application gateway environment. These practices can help you make the most of your investment in this service:

  • Monitor and plan instance count to avoid resource crunch Azure Application Gateway v2 supports auto-scaling, but if you are using Azure Application Gateway v1, you will have to scale manually. The v1 SKU supports scaling up to 32 instances. To identify the right instance count, monitor CPU utilization for the application gateway for at least a month and identify the peak CPU usage. Then add a buffer of 15% to 20% to handle unexpected spikes and growth. Finally, select your instance count based on this sizing.

  • Upgrade to the v2 SKU as soon as possible As mentioned, Azure Application gateway v1 supports manual scaling only—which is not the most efficient or cost-effective way to manage your gateway instances. In addition to auto-scaling, the v2 SKU offers other performance benefits, including SSL/TLS offloading, improved deployment performance, zone redundancy, and many others. You should upgrade to the v2 SKU as soon as you can so you can benefit from all these (and other) features.

  • Set the maximum instance count in the v2 SKU Because the v2 SKU supports auto-scaling, and because charges are levied based on how many units are used, it is important to consider budgetary requirements when setting the maximum instance count. The v2 SKU supports a maximum of 125 instances—which means unplanned spikes can result in more instances being activated than are budgeted for.

  • Size the gateway subnet for future growth Size the subnet in which you plan to host application gateway instances to take into account future scalability requirements. Changes to the subnet configuration are not currently supported and require a redeployment of the service, resulting in possible disruption.

  • Set the minimum instance count for v2 SKU Bringing additional instances online to accept traffic when auto-scaling does take some time—usually between six and seven minutes. Any unexpected spikes during this period can result in traffic drops or higher response latency. You should monitor CPU usage for at least a month to identify the minimum instances required, and maintain a buffer of 15% to 20% to allow for unexpected spikes.

  • Monitoring and alerting Set up alerts for different gateway metrics to monitor CPU usage, instance scaling, and network utilization so you can be notified of any anomalies that might cause potential outages. Examples of alerts could include average CPU usage spiking by 75% to 80% for a sustained period of time, too many failed requests, gateway not responding, logs containing numerous 4xx or 5xx errors (indicating response issues), and too many unhealthy back-end hosts.

  • Set up geo-filtering to block unwanted countries/regions The v2 SKU supports geo-filtering, which enables you to allow or block traffic from specific countries or regions. It is a good practice to use this feature to prevent (or allow) traffic from certain locales to connect to your web applications to reduce your attack surface.

  • Set up bot protection to prevent attacks The v2 SKU has a feature to prevent traffic from known bot networks. Enable this feature to intercept known malicious traffic before it reaches your web applications.

  • Set up diagnostics logging and long-term retention Collect firewall, performance, and access logs for your application gateway instances and save them in Azure storage, Log Analytics, or an event stream. These logs can help you identify potential issues and take proactive action. Set up retention policies based on historical data storage comparison and compliance requirements for your organization.

  • Set up the latest TLS policy version for extra security Use the latest TLS policy version (currently AppGwSslPolicy20170401S) to enforce TLS 1.2 and stronger ciphers.