Manage high availability and disaster recovery

By Victor Isakov
11/7/2017

Contents

Skill 4.1: Design a high availability solution
Skill 4.2: Design a disaster recovery solution
Skill 4.3: Implement log shipping
Skill 4.4: Implement Availability Groups
Skill 4.5: Implement failover clustering
Thought experiment
Thought experiment answers
Chapter summary

In this sample chapter from Exam Ref 70-764 Administering a SQL Database Infrastructure, learn how to design appropriate high availability and disaster recovery solutions in SQL Server with special attention to installation, configuration, and administration processes.

It is important to understand the difference between high availability and disaster recovery. It is not uncommon for management in organizations to misunderstand these concepts and use the wrong technology for their SQL Server infrastructure. In this last chapter, we examine the high availability technologies available in SQL Server, as promised in Chapter 2, “Manage backup and restore of databases.”

With high availability, you are using technology in SQL Server to minimize the downtime of a given database solution to maximize its availability. With disaster recovery, however, you are using technology to recover from a disaster incident, potentially minimizing the amount of data lost. In some cases, data loss is acceptable, because the imperative is to get your database solution online as soon as possible. That is why it is critical to engage with all stakeholders to determine the business requirements. With both high availability and disaster recovery people and processes play a key part, so make sure you don’t focus solely on the technology.

The exam will test your ability to design the appropriate high availability solution for a given scenario, which is why Skill 4.1 starts with a discussion about high availability and the primary considerations for designing a particular solution. Skill 4.2 then covers the designing of a disaster recovery solution, which commonly goes hand-in-hand with a high availability solution. Given how we covered disaster recovery in Chapter 2, a detailed discussion will not be required here. Skill 4.3 examines the log shipping technology in SQL Server and how it is primarily used to provide disaster recovery. Skill 4.4 then details Availability Groups and examines how they can be used to provide both high availability and scale-out capability to your databases. Finally, in Skill 4.5 we implement failover clustering solutions. Although this high availability technology has been available since SQL Server 2000, it’s commonly used in the industry and should not be discounted as an old, unused technology. Microsoft keeps investing in failover clustering, and we will learn about how SQL Server can take advantage of cluster shared volumes.

High availability technologies are complex and involve a lot of set up and configuration, so this chapter has many figures that show you their installation, configuration, and administration processes. Make sure you examine the various options in the figures and listings in this chapter to best prepared for the exam.

NOTE: Preparing for the Exam

To help you prepare for the exam and help you familiarize yourself with the high availability capabilities in SQL Server, you can use Hyper-V and setup a number of Virtual Machine (VMs) on your computer. It is recommended that you first install and configure a domain controller VM. All SQL Server VMs should be joined to the domain. It is also recommended that you use Windows Server 2016, because the examples are based on it.

SQL Server 2016/2017 Developer Edition is equivalent to the Enterprise Edition and now available for free at https://www.microsoft.com/en-us/sql-server/sql-server-downloads.

There is no free version of Windows Server 2016, however, you can download the Evaluation Edition, which will work for 180 days without re-arming it, at https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-2016/.

Skills covered in this chapter:

Design a high availability solution
Design a disaster recovery solution
Implement log shipping
Implement Availability Groups
Implement failover clustering

Skill 4.1: Design a high availability solution

High availability, as the name suggests, is concerned with making sure that your database is highly available. The cost of an unavailable database solution, in today’s modern, globalized 24x7, Internet connected world can be catastrophic to your organization.

One of the first questions you should be asking of your organization is what availability is required for your database solution. This will form part of your Service Level Agreement (SLA) for your database solution. Availability is usually expressed as a percentage of uptime in a given year, and can be expressed as follows:

Click to view larger image

This is commonly referred to as the number of nines required. Table 4-1 shows the availability, the number of nines, and how much down time that corresponds to annually.

TABLE 4-1: Number of nines for high availability.

Availability	Nines	Annual downtime	Monthly downtime	Weekly downtime	Daily downtime
90%	1	36.5 days	72 hours	16.8 hours	2.4 hours
95%	1.5	18.25 days	36 hours	8.4 hours	1.2 hours
99%	2	3.65 days	7.20 hours	1.68 hours	14.4 minutes
99.5%	2.5	1.83 days	3.60 hours	50.4 minutes	7.2 minutes
99.9%	3	8.76 hours	43.8 minutes	10.1 minutes	1.44 minutes
99.95%	3.5	4.38 hours	21.56 minutes	5.04 minutes	43.2 seconds
99.99%	4	52.56 minutes	4.38 minutes	1.01 minutes	8.66 seconds
99.995%	4.5	26.28 minutes	2.16 minutes	30.24 seconds	4.32 seconds
99.999%	5	5.26 minutes	25.9 seconds	6.05 seconds	864.3 milliseconds
99.9999%	6	31.5 seconds	2.59 seconds	604.8 milliseconds	86.4 milliseconds
99.99999%	7	3.15 seconds	262.97 milliseconds	60.48 milliseconds	8.64 milliseconds
99.99999999%	8	315.569 milliseconds	26.297 milliseconds	6.048 milliseconds	0.864 milliseconds
99.999999999%	9	31.5569 milliseconds	2.6297 milliseconds	0.6048 milliseconds	0.0864 milliseconds

As you can see, achieving even four nines might be difficult. Four nines represents only 4.38 minutes of downtime per month. Now consider how long it takes for your servers to be rebooted. On modern servers, that have a large amount of memory, it might take you 15-30 minutes for them to boot up, as they run through their BIOS memory checks. In most cases these BIOS memory checks cannot be turned off. Consider further how often you patch your Windows environment, which typically requires a reboot, and how long that takes. Do not underestimate the potential complexity of achieving anything beyond three nines.

When determining your SLA you should also define what constitutes downtime in the context of your SLA. There are two types of downtime:

Planned downtime Planned downtime refers to the downtime incurred by your maintenance tasks. These maintenance tasks might include patching hardware or software, hardware maintenance, patching the Windows operating system, or patching SQL Server. Planned downtime is typically scheduled and controlled through business processes. Consequently, there is typically no data loss.
Unplanned downtime Unplanned downtime refers to downtime that is incurred due to an unexpected incident that causes outage. Examples include:
- Hardware failures, such as with a disk drive or power supply unit failing or bad firmware in a hardware vendor’s HBA.
- Data center failure, such as with a power failing, or flooding occurring.
- Software failure, such as with Windows crashing, SQL Server hanging, or a corrupt database.
- User error, such as dropping a database, or accidentally deleting data.

In some SLAs there are only penalties enforced for unplanned downtime.

Once you have determined your organizations availability requirements you can assess which SQL: Server high availability technology fits your business requirements. You might need to take in multiple factors, including:

Whether automatic failover is required Certain high availability technologies and configurations do not offer automatic failover. In certain use cases an organization might not require high availability.
Failover speed Different high availability technologies offer different failover speeds, so understanding how quickly a failover needs to take will help you choose the appropriate solution.
Scalability Whether or not you need to scale out your database solution impacts your high availability technology selection. Scaling out your database can provide both performance and uptime benefits. Availability Groups offer the best scale-out capability.
Infrastructure between data centers The latency and throughput between sites/data centers will directly impact what high availability technologies can be implemented. Latency is more important than distance. You do not want automatic fail overs to be performed due to slow response times between data centers. In this case automatic failover might not be required.
Connecting applications What applications are accessing the database and what network libraries they use to connect to the databases will also play an important factor in any design. Certain high availability technologies might not work as well with older applications.
Recovery model This is a very important consideration. What recovery model is being used by a database and the volume of transactions experienced by the database will dictate what high availability technology can be used. Remember that Availability Groups require the databases to be using the full recovery model. We covered recovery models in Chapter 2.
Number of databases Whether the database solution involves multiple databases that need to fail over as a single unit is another important factor and design consideration.
Database size This covers the size of the databases and how much it will cost to potentially replicate those databases on multiple instances of SQL Server. Very Large Databases (VLDBs) might be too expensive to host on multiple SQL Server instances. They might also be larger than what is possible to fit locally on a server, for example.
Database administrator skill set Determine whether your organization has a team of database administrators and how experienced they are. Certain high availability technologies are more complex to administer.
SQL Server Edition The SQL Server licensing implications and associated costs with a different edition is a very important factor in organizations. Certain high availability technologies are only available in the Enterprise Edition.

SQL Server supports the following high availability technologies:

Failover clustering With failover clustering you rely on the features of the Windows operating system to build a cluster of separate nodes that together provide redundancy at the hardware and software levels.
Transactional replication With transactional replication various replication agents are reading a database’s transaction log, storing those captured transactions in a separate database, and then replicating those transactions to other databases located on different servers.
Database mirroring With database mirroring the database engine automatically transmits the transaction log to one other server when the same database exists.
Availability groups Availability groups are an evolution of Database Mirroring where the transaction log can be transmitted in real time to multiple servers that maintain a copy of the database.
Log shipping With log shipping multiple copies of the database are kept on multiple servers through scheduled log backups and restores.

Each high availability technology will have its own set of associated costs, and pros, and cons. As a database administrator, it is up to you to assess your business requirements and architect the appropriate high availability solution. In this book, we will focus on Log Shipping, Availability Groups and Failover Clustering.

Don’t forget that you can combine high availability technologies. For example, you can take advantage of failover clustering locally in one data center to provide high availability and use log shipping to another data center for disaster recovery purposes.

Save to your account