Lesson 2: Site-level fault tolerance
Site-level fault tolerance involves ensuring that an organization has access to critical resources in the event that a site goes unexpectedly offline on a short-term or long-term basis. Hyper-V Replica provides organizations with a method of having critical virtual machines replicated to a second site, allowing both planned and unplanned failover with minimal data loss. Multisite clustering involves configuring a cluster so that the cluster retains quorum in the event that a site goes offline unexpectedly.
Hyper-V Replica makes it possible for a virtual machine to be replicated from one Hyper-V host to another. The computer running Hyper-V hosting the replica can be in the same room or on another continent. Replication is asynchronous and the replica copy is a consistent but lagged version of the original. Hyper-V Replica does not require access to shared storage or that computers be part of the same Active Directory domain. Figure 8-13 shows Hyper-V Replica configured so that the six virtual machines running on one Hyper-V host are replicated to another Hyper-V host.
FIGURE 8-13 Hyper-V Replica
You can use Hyper-V Replica to provide site-level fault tolerance for your organization’s virtual machines. For example, you could configure replication so that all of the production virtual machines at a primary site automatically replicate to your organization’s disaster recovery (DR) site. In the event that a disaster occurred that destroyed the infrastructure at the primary site, you would be able to start up the virtual machines at the DR site. It’s important to note that because replication is asynchronous, if failover to the DR site is unplanned then there will be some data loss. The data loss might only be a few seconds or a few minutes, but it’s important to remember that the virtual machines at the DR site are lagged copies of the originals.
Configuring Hyper-V Replica
Prior to configuring replication for an individual virtual machine, you must perform the following steps on both the source and destination Hyper-V replica servers:
- Enable replication This makes it possible for the Hyper-V host to function as a replica server.
- Choose the authentication method You can use Kerberos or certificate-based authentication. Kerberos is appropriate if the source and destination servers are part of the same Active Directory environment. Certificate-based authentication is appropriate when the source and destination servers are not members of the same Active Directory environment.
- Specify which authenticated servers replication can be performed from The options are to allow replication from any server, or to only allow authentication from specific servers. When configuring server authorization, you also specify the location where replicated files are stored. You can do this on a per-server basis, or have all replicated data stored in a specific folder tree.
You configure these options on the Replication Configuration page of the Hyper-V Settings dialog box as shown in Figure 8-14.
FIGURE 8-14 Enabling this computer as a replica server
After you’ve configured the replica server to support a specific form of authentication, you also need to configure firewall rules to allow authentication and replica traffic. There are predefined firewall rules available in Windows Firewall with Advanced Security that you enable to support replication. Figure 8-15 shows the built-in firewall rule that you should enable when you use Kerberos authentication. The built-in firewall rule that you use when using certificate-based authentication allows traffic on port 443.
FIGURE 8-15 The Hyper-V Replica HTTP firewall rule
When you have configured the replica configuration and firewall rules on the source and destination servers, you can configure replication for an individual Hyper-V virtual machine. The first step is to select a server to host the replica. As shown in Figure 8-16, you need to specify an authentication type and whether replicated data will be compressed.
FIGURE 8-16 Specifying connection parameters
You specify which virtual hard disks will be replicated, the replication frequency (an option new to Windows Server 2012 R2 which allows you to configure replication to occur as frequently as every 30 seconds), and the number of recovery points that will be created. A recovery point is a checkpoint of the replicated virtual machine at a particular point in time. In the event of an unplanned failover, you can recover to the most recent recovery point, or you can choose one of the additional recovery points. Recovery points can be generated up to once an hour. Recovery points do not dictate how often replication occurs. Recovery points enable you to roll back to a previous point in time, such as prior to a point where data stored on the virtual machine became corrupted. Figure 8-17 shows the Configure Recovery History page of the Enable Replication Wizard.
FIGURE 8-17 Configure Additional Recovery Points
The final step in configuring replication is to select how to create the initial replica. You can transfer replicated data directly over the network, you can transfer a copy using external media, or you can use an existing virtual machine that is already present on the replica server. Using media to seed the initial replica copy might be appropriate when you are configuring replication for very large virtual machines for which the replication traffic must cross a wide area network (WAN) link. Figure 8-18 shows configuring initial replication when running the Enable Replication Wizard.
FIGURE 8-18 Choosing the initial replication method
Planned failover involves moving a virtual machine configured for replication from its primary host to the replica server. To ensure that the virtual machine on the replica is up to date, the virtual machine being moved must be in a turned-off state. This is a substantial difference from Hyper-V live migration in which the virtual machine is able to move between hosts while continuing to respond to client requests.
When you perform planned failover, a set of prerequisite checks occur, including checking whether replication back from the replica server to the current primary is allowed. As Figure 8-19 shows, you can also configure planned failover so that the replica virtual machine is automatically started when the failover process completes and that reverse replication is also configured once failover has occurred. You can perform planned failover through the Hyper-V Manager console or by using the Start-VMFailover Windows PowerShell cmdlet.
FIGURE 8-19 Planned failover
Unplanned failover occurs when the primary server has failed unexpectedly. During unplanned failover, you connect to the replica server and trigger failover manually. When you perform an unplanned failover, you need to specify a recovery point to use. You configure the number of available recovery points when initially configuring replication for a virtual machine. By default, when you perform unplanned failover the dialog box suggests the most recent recovery point. Figure 8-20 shows the Failover dialog box. The failover process automatically starts the virtual machine on the replica host.
FIGURE 8-20 The Failover dialog box
After the unplanned failover process is complete and you have restored the original host Hyper-V server, you should configure reverse replication to re-create the replication relationship. This process is almost identical to creating the initial replication relationship, and the wizard is prepopulated with the details of the original relationship when you run it.
Hyper-V Replica Broker
Hyper-V Replica Broker enables you to configure Hyper-V Replica for virtual machines replicated from or to Windows Server 2012 or Windows Server 2012 R2 Hyper-V failover clusters. Hyper-V Replica is not necessary if Hyper-V Replica is being performed between Hyper-V hosts that are not participating in a failover cluster.
To configure the Hyper-V Replica Broker node, perform the following steps:
- On an existing failover cluster that has the Hyper-V role installed, use Failover Cluster Manager to add the Hyper-V Replica Broker role.
- Verify that the Hyper-V Replica Broker role can be moved across all nodes in the cluster
Until you have deployed the Hyper-V Replica Broker role, you are unable to configure the Hyper-V nodes or virtual machines hosted within the cluster to support replication. The Hyper-V Replica Broker role is shown deployed on a failover cluster in Figure 8-21.
FIGURE 8-21 The Failover Cluster Manager
Failover clusters can span multiple sites. When configuring a cluster that spans two sites, you should do the following:
- Ensure that there are an equal number of nodes in each site.
Allow each node to have a vote.
- Enable dynamic quorum. Dynamic quorum makes it possible for quorum to be recalculated when nodes leave the cluster. When nodes leave the cluster, the cluster determines if it still retains quorum. If it does, the cluster recalculates a new quorum based on the number of nodes that remain available. For example, if you have a 7-node cluster and 2 nodes fail, quorum is retained and dynamic quorum will recalculate the quorum requirement based on the 5 remaining nodes. Dynamic quorum also performs a recalculation when nodes return to the cluster. Dynamic quorum is enabled by default on Windows Server 2012 and Windows Server 2012 R2 failover clusters.
Use a file share witness. As shown in Figure 8-22, the file share witness should be hosted in a third site that has separate connectivity to the two sites that host the cluster nodes. When configured in this manner, the cluster retains quorum in the event that one of the sites is lost.
FIGURE 8-22 A multisite cluster
In the event that you only have two sites and are unable to place a file share witness in an independent third site, you can edit the cluster configuration manually to reassign votes so that the cluster recalculates quorum.
Dynamic witness is a technology new to Windows Server 2012 R2 that allows the vote of the witness to be ignored when recalculating quorum after a node goes offline if the total number of votes of the remaining nodes and the witness would be an even number. For example, you might configure a 7-node cluster with a witness. In this scenario, the witness would be ignored when calculating quorum. If one node failed and the dynamic quorum process recalculates quorum, then the witness vote would be counted during the next check for quorum. Dynamic witness also minimizes the impact of a witness server failing on quorum calculations by deprecating the failed witness server’s vote.
Tie breaker for 50% node split
Tie breaker for 50% node split is a new Windows Server 2012 R2 feature that allows you to configure a cluster node so that its vote is deprecated for quorum calculations when communications between sites that host cluster nodes fail. For example, in the scenario above, Melbourne has two nodes, Sydney has two nodes, and a witness share is present in Canberra. If the witness share in Canberra fails and then the link between Melbourne and Sydney fails, problems will arise as neither the nodes in the Melbourne or Sydney sites will be able to determine if they have quorum. With tie breaker for 50% node split, you can designate a node as having a lower priority than other nodes when it comes to calculating quorum. For example, you designate the second node in Melbourne as having lower priority. When you do this, functionally there will be three quorum votes for the cluster to take into consideration if the Canberra witness fails. If the Canberra witness fails and communication is lost between the Melbourne and Sydney sites, the two nodes in the Sydney site will retain quorum.
Force quorum resiliency
Force quorum resiliency is a new Windows Server 2012 R2 feature designed to minimize the problems related to partitioned or “split brained” clusters. In certain scenarios, you might need to forcibly restart cluster nodes when communication between sites that host cluster nodes is lost. For example, imagine that there are three cluster nodes in Sydney and two cluster nodes in Melbourne. The service hosted on the cluster needs to be available to branch offices in other cities around Australia. A failure occurs that causes communication to the Sydney site to be lost. In this scenario, the three nodes in Sydney would retain quorum. However, it’s communication to the Sydney site that is the problem whereas communication to the Melbourne site is still functioning. In this scenario you would forcibly restart the Melbourne cluster nodes so that the service hosted on the cluster would remain available to the other Australian branch offices.
In previous versions of Windows Server, problems would occur when connectivity was restored to the Sydney site as it would host nodes that also believed they had quorum. With Windows Server 2012 R2, when communication is reestablished, the nodes in the Sydney site would detect that the nodes in the Melbourne site were forcibly restarted, and so would restart themselves so that they could automatically rejoin the cluster, avoiding the cluster falling into a partitioned or “split brained” state.
Virtual machine network health detection
Virtual machine network health detection is a Windows Server 2012 R2 feature that provides fault detection and remediation networks used by virtual machines. It allows you to configure a virtual machine so that live migration automatically occurs if a network failure occurs in such a way that the network is not available to the virtual machine on the current virtualization cluster host node but is available to the virtual machine on a different virtualization cluster host node. Virtual machine network health detection requires that you configure multiple network paths between virtualization cluster host nodes.
- Hyper-V Replica enables you to deploy a replica of a virtual machine on another Hyper-V server.
- You can use Kerberos or certificate-based authentication for Hyper-V Replica.
- You must configure firewall rules for port 80 (Kerberos) or port 443 (certificate-based authentication) when configuring Hyper-V replica.
- You must deploy Hyper-V Replica Broker if you want to use Hyper-V Replica with virtual machines hosted on failover clusters.
- When configuring multisite clustering, ensure that an equal number of nodes are in each site and a file share witness is placed in a third site.
Answer the following questions to test your knowledge of the information in this lesson. You can find the answers to these questions and explanations of each answer choice in the “Answers” section at the end of this chapter.
You want to perform a planned failover of a virtual machine that is configured to replicate to another Hyper-V server through Hyper-V Replica. Which of the following steps should you take prior to performing the failover?
- Take a checkpoint of the virtual machine.
- Pause the virtual machine.
- Shut down the virtual machine.
- Export the virtual machine.
You are planning the deployment of a cluster that should keep functioning in the event that a site is lost. Your organization has three sites. Each site has a connection to the other two sites. The cluster will have six nodes. Which of the following strategies should you implement to ensure that the cluster will remain operational in the event that an entire site becomes unavailable? (Choose two. Each answer forms part of a complete solution.)
- Place two nodes in the first site. Place three nodes in the second site.
- Place a file share witness in the third site.
- Place three nodes in the first site. Place three nodes in the second site.
- Place one node in the third site.
Which of the following predefined firewall rules would you enable if you were configuring Hyper-V Replica and using Kerberos authentication?
- Failover Cluster Manager
- Hyper-V Management Clients
- Hyper-V Replica HTTP
- Hyper-V Replica HTTPS
Which of the following predefined firewall rules would you enable if you were configuring Hyper-V Replica and using certificate-based authentication?
- Hyper-V Replica HTTPS
- Hyper-V Management Clients
- Failover Cluster Manager
- Hyper-V Replica HTTP