Manage high availability and disaster recovery

  • 11/7/2017

In this sample chapter from Exam Ref 70-764 Administering a SQL Database Infrastructure, learn how to design appropriate high availability and disaster recovery solutions in SQL Server with special attention to installation, configuration, and administration processes.

It is important to understand the difference between high availability and disaster recovery. It is not uncommon for management in organizations to misunderstand these concepts and use the wrong technology for their SQL Server infrastructure. In this last chapter, we examine the high availability technologies available in SQL Server, as promised in Chapter 2, “Manage backup and restore of databases.”

With high availability, you are using technology in SQL Server to minimize the downtime of a given database solution to maximize its availability. With disaster recovery, however, you are using technology to recover from a disaster incident, potentially minimizing the amount of data lost. In some cases, data loss is acceptable, because the imperative is to get your database solution online as soon as possible. That is why it is critical to engage with all stakeholders to determine the business requirements. With both high availability and disaster recovery people and processes play a key part, so make sure you don’t focus solely on the technology.

The exam will test your ability to design the appropriate high availability solution for a given scenario, which is why Skill 4.1 starts with a discussion about high availability and the primary considerations for designing a particular solution. Skill 4.2 then covers the designing of a disaster recovery solution, which commonly goes hand-in-hand with a high availability solution. Given how we covered disaster recovery in Chapter 2, a detailed discussion will not be required here. Skill 4.3 examines the log shipping technology in SQL Server and how it is primarily used to provide disaster recovery. Skill 4.4 then details Availability Groups and examines how they can be used to provide both high availability and scale-out capability to your databases. Finally, in Skill 4.5 we implement failover clustering solutions. Although this high availability technology has been available since SQL Server 2000, it’s commonly used in the industry and should not be discounted as an old, unused technology. Microsoft keeps investing in failover clustering, and we will learn about how SQL Server can take advantage of cluster shared volumes.

High availability technologies are complex and involve a lot of set up and configuration, so this chapter has many figures that show you their installation, configuration, and administration processes. Make sure you examine the various options in the figures and listings in this chapter to best prepared for the exam.

Skills covered in this chapter:

  • Design a high availability solution

  • Design a disaster recovery solution

  • Implement log shipping

  • Implement Availability Groups

  • Implement failover clustering

Skill 4.1: Design a high availability solution

High availability, as the name suggests, is concerned with making sure that your database is highly available. The cost of an unavailable database solution, in today’s modern, globalized 24x7, Internet connected world can be catastrophic to your organization.

One of the first questions you should be asking of your organization is what availability is required for your database solution. This will form part of your Service Level Agreement (SLA) for your database solution. Availability is usually expressed as a percentage of uptime in a given year, and can be expressed as follows:

This is commonly referred to as the number of nines required. Table 4-1 shows the availability, the number of nines, and how much down time that corresponds to annually.

TABLE 4-1: Number of nines for high availability.

Availability

Nines

Annual downtime

Monthly downtime

Weekly downtime

Daily downtime

90%

1

36.5 days

72 hours

16.8 hours

2.4 hours

95%

1.5

18.25 days

36 hours

8.4 hours

1.2 hours

99%

2

3.65 days

7.20 hours

1.68 hours

14.4 minutes

99.5%

2.5

1.83 days

3.60 hours

50.4 minutes

7.2 minutes

99.9%

3

8.76 hours

43.8 minutes

10.1 minutes

1.44 minutes

99.95%

3.5

4.38 hours

21.56 minutes

5.04 minutes

43.2 seconds

99.99%

4

52.56 minutes

4.38 minutes

1.01 minutes

8.66 seconds

99.995%

4.5

26.28 minutes

2.16 minutes

30.24 seconds

4.32 seconds

99.999%

5

5.26 minutes

25.9 seconds

6.05 seconds

864.3 milliseconds

99.9999%

6

31.5 seconds

2.59 seconds

604.8 milliseconds

86.4 milliseconds

99.99999%

7

3.15 seconds

262.97 milliseconds

60.48 milliseconds

8.64 milliseconds

99.99999999%

8

315.569 milliseconds

26.297 milliseconds

6.048 milliseconds

0.864 milliseconds

99.999999999%

9

31.5569 milliseconds

2.6297 milliseconds

0.6048 milliseconds

0.0864 milliseconds

As you can see, achieving even four nines might be difficult. Four nines represents only 4.38 minutes of downtime per month. Now consider how long it takes for your servers to be rebooted. On modern servers, that have a large amount of memory, it might take you 15-30 minutes for them to boot up, as they run through their BIOS memory checks. In most cases these BIOS memory checks cannot be turned off. Consider further how often you patch your Windows environment, which typically requires a reboot, and how long that takes. Do not underestimate the potential complexity of achieving anything beyond three nines.

When determining your SLA you should also define what constitutes downtime in the context of your SLA. There are two types of downtime:

  • Planned downtime Planned downtime refers to the downtime incurred by your maintenance tasks. These maintenance tasks might include patching hardware or software, hardware maintenance, patching the Windows operating system, or patching SQL Server. Planned downtime is typically scheduled and controlled through business processes. Consequently, there is typically no data loss.

  • Unplanned downtime Unplanned downtime refers to downtime that is incurred due to an unexpected incident that causes outage. Examples include:

    • Hardware failures, such as with a disk drive or power supply unit failing or bad firmware in a hardware vendor’s HBA.

    • Data center failure, such as with a power failing, or flooding occurring.

    • Software failure, such as with Windows crashing, SQL Server hanging, or a corrupt database.

    • User error, such as dropping a database, or accidentally deleting data.

In some SLAs there are only penalties enforced for unplanned downtime.

Once you have determined your organizations availability requirements you can assess which SQL: Server high availability technology fits your business requirements. You might need to take in multiple factors, including:

  • Whether automatic failover is required Certain high availability technologies and configurations do not offer automatic failover. In certain use cases an organization might not require high availability.

  • Failover speed Different high availability technologies offer different failover speeds, so understanding how quickly a failover needs to take will help you choose the appropriate solution.

  • Scalability Whether or not you need to scale out your database solution impacts your high availability technology selection. Scaling out your database can provide both performance and uptime benefits. Availability Groups offer the best scale-out capability.

  • Infrastructure between data centers The latency and throughput between sites/data centers will directly impact what high availability technologies can be implemented. Latency is more important than distance. You do not want automatic fail overs to be performed due to slow response times between data centers. In this case automatic failover might not be required.

  • Connecting applications What applications are accessing the database and what network libraries they use to connect to the databases will also play an important factor in any design. Certain high availability technologies might not work as well with older applications.

  • Recovery model This is a very important consideration. What recovery model is being used by a database and the volume of transactions experienced by the database will dictate what high availability technology can be used. Remember that Availability Groups require the databases to be using the full recovery model. We covered recovery models in Chapter 2.

  • Number of databases Whether the database solution involves multiple databases that need to fail over as a single unit is another important factor and design consideration.

  • Database size This covers the size of the databases and how much it will cost to potentially replicate those databases on multiple instances of SQL Server. Very Large Databases (VLDBs) might be too expensive to host on multiple SQL Server instances. They might also be larger than what is possible to fit locally on a server, for example.

  • Database administrator skill set Determine whether your organization has a team of database administrators and how experienced they are. Certain high availability technologies are more complex to administer.

  • SQL Server Edition The SQL Server licensing implications and associated costs with a different edition is a very important factor in organizations. Certain high availability technologies are only available in the Enterprise Edition.

SQL Server supports the following high availability technologies:

  • Failover clustering With failover clustering you rely on the features of the Windows operating system to build a cluster of separate nodes that together provide redundancy at the hardware and software levels.

  • Transactional replication With transactional replication various replication agents are reading a database’s transaction log, storing those captured transactions in a separate database, and then replicating those transactions to other databases located on different servers.

  • Database mirroring With database mirroring the database engine automatically transmits the transaction log to one other server when the same database exists.

  • Availability groups Availability groups are an evolution of Database Mirroring where the transaction log can be transmitted in real time to multiple servers that maintain a copy of the database.

  • Log shipping With log shipping multiple copies of the database are kept on multiple servers through scheduled log backups and restores.

Each high availability technology will have its own set of associated costs, and pros, and cons. As a database administrator, it is up to you to assess your business requirements and architect the appropriate high availability solution. In this book, we will focus on Log Shipping, Availability Groups and Failover Clustering.

Don’t forget that you can combine high availability technologies. For example, you can take advantage of failover clustering locally in one data center to provide high availability and use log shipping to another data center for disaster recovery purposes.