Exam Ref 70-342 Advanced Solutions of Microsoft Exchange Server 2013 (MCSE): Design, Configure, and Manage Site Resiliency

  • 2/10/2015
This chapter from Exam Ref 70-342 Advanced Solutions of Microsoft Exchange Server 2013 (MCSE) discusses the features and roles of Exchange Server 2013 that are utilized in a resilient site design so that you can be sure you have a solid understanding of the requirements of the software, and so that your network, servers, and processes will work as required.

Email services are typically mission critical to most businesses, so knowing how to design, configure, and manage Exchange Server 2013 for redundancy across multiple locations is a key requirement of the Exchange Server “IT Pro” type of administrator. Getting the design correct is of the first consideration, because you will have issues in the configuration and management of the service across multiple sites if it is not designed correctly.

This chapter discusses the features and roles of Exchange Server 2013 that are utilized in a resilient site design so that you can be sure you have a solid understanding of the requirements of the software, and so that your network, servers, and processes will work as required.

Objectives in this chapter:

  • Objective 2.1: Manage a site-resilient Database Availability Group (DAG)
  • Objective 2.2: Design, deploy, and manage a site-resilient CAS solution
  • Objective 2.3: Design, deploy, and manage site resilience for transport
  • Objective 2.4: Troubleshoot site-resiliency issues

Objective 2.1: Manage a site-resilient Database Availability Group (DAG)

The Database Availability Group is a collection of up to 16 Exchange Server 2013 servers with the Mailbox Role installed that may or may not also have the Client Access Server (CAS) role installed. The mailbox database located on any one of these servers can be replicated onto one or more other remaining servers.

Understanding DAGs is a key requirement of the IT pro in designing Exchange Server solutions, and it is a key objective of this exam. This exam will concentrate its questions on the site-resiliency aspect of DAGs, and not the initial configuration of a DAG or its mailbox databases since that is covered in exam 70-341.

Planning and implementing Datacenter Activation Coordination (DAC)

Datacenter Activation Coordination (DAC) mode is a setting that you make on a DAG’s configuration that is designed to prevent split-brain conditions at the database level, in a scenario in which you’re restoring service to a primary datacenter after a datacenter switchover has been performed.

A split-brain condition is where a single DAG is operating in two (or more) locations, is not communicating across a shared link, and has mounted the same database in both locations. In both locations, Exchange Server is making changes to the mailboxes in the database. It could be that the owner of the mailbox has successfully logged in and is sending emails on one active copy, or it could be that the server has mounted the database in the site where emails from the internet arrive and so is adding new emails to the same mailbox, but in a different active copy of the same database.

There can only be one active copy of a mailbox database at any given time within a DAG. Typically the database is active in the primary site for the DAG, and there are passive copies in the primary site and in any secondary site. Changes that happen to a given database are logged into the same database’s transaction logs, and these logs are copied to the other servers where they are replayed into the passive database. When all is operating correctly, the changes are happening on only one database, the active database, and these same changes are identically repeated on all passive copies.

In a split-brain scenario both databases are making independent changes to their copy of the database, and the only way to resolve this is to delete one or more copies, leaving only one active database and any database that remained in a passive state. Any database, that also though it was active, will need to be removed. This removed copy will need to be re-added & re-seeded; making it a passive copy once again. Seeding a database is heavy on a network link and can be time consuming. This entire process will result in the loss of any changes/additions of data that occurred on this unintentionally activated copy. Therefore, it is important to avoid a split-brain condition.

To ensure that only one copy of a database will be mounted, and avoid split brain, Exchange Server uses the Datacenter Activation Coordination Protocol (DACP) as a way to communicate with other Exchange Servers in the DAG, and to determine if each server has the rights to mount its databases or not regardless of whether the rules of cluster quorum would determine that it has majority.

DAG scenario without Datacenter Activation Coordination enabled

The DAC property is not enabled by default, so consider this scenario where you have two servers in the primary datacenter and the witness server (a total of three votes in the primary datacenter), and a second datacenter that contains two other servers. In the event of a full outage at the primary datacenter, the two DAG members and the witness server go offline. The IT department restores service in the secondary datacenter (by forcing quorum on the surviving nodes in the cluster), and the databases are mounted in the secondary datacenter.

The reason for the outage in the primary datacenter goes away and the servers restart. All three servers (two DAG members and the witness server) come online, but the WAN to the secondary datacenter does not come up (as is typical in outages involving power loss). In the primary datacenter the Exchange Servers see that they have quorum (three votes out of five voting members) and so mount their databases. But the databases in the secondary datacenter are also mounted, and so changes can occur to two copies of the same database (split-brain).

DAG scenario with Datacenter Activation Coordination enabled

Though the DAC property of the DAG is not enabled by default, it should be enabled for all DAGs with two or more members that use continuous replication. DAC mode shouldn’t be enabled for DAGs that use third-party replication mode unless specified by the third-party vendor.

When the DAC property is set, split-brain conditions are avoided. Even though a server is located in a site that has quorum, it will not mount its databases until it receives the okay from a server that has already mounted its databases.

Consider the above scenario again, this time in a DAG that has DAC enabled. Power has been lost to the primary datacenter, and the secondary datacenter has been activated for service. Power is restored to the primary location though initially the WAN is not back online. The Exchange Servers and the witness server all come back online and the site has quorum, but as no Exchange Server in that site can contact any Exchange Server in the DAG that has already mounted its databases, they do not mount their own databases. Service remains active in the secondary datacenter.

Later the WAN is restored and the Exchange Servers in the primary site can contact the Exchange Servers in the secondary site. The cluster is updated with the latest configuration of the cluster from the secondary site, and the servers in the primary site learn they were evicted from the cluster. Therefore, the cluster service stops on these machines. At no time did these primary datacenter servers communicate with a member of the DAG that had already mounted its databases.

Once the primary datacenter servers are added back into the cluster and are members of the DAG again, they will see that they are passive servers and that they can resume transaction log copying from the changes made in the secondary datacenter. With regards to being able to mount their databases, they will be able to communicate with another member of the DAG (in the secondary datacenter) that has already mounted its databases, and so will receive permission to mount its own copies of the databases if required to.

Finally, to fully restore service and move the active copy back to the primary site, the active copy of the database is moved and successfully mounted without data loss, on to a server in the primary site.

How DAC mode works

To avoid the split-brain condition, in Exchange Server 2013 a mailbox server will not mount its databases until it has confirmation from another server in the DAG that it is allowed to mount its databases. In a scenario where the entire site is down and then comes back online, each Exchange Server will boot into a mode where it is not allowed to mount databases. All of the servers in the primary site will communicate with each other, but as no server has the right to mount databases they will all not mount their databases. Once the WAN is restored and connectivity resumed to the secondary datacenter, the servers in the primary site can communicate with servers that have mounted databases, and so are allowed to mount their own databases if required.

DAC mode is managed by the Active Manager, the process that mounts databases on a server, storing a bit in memory that reads either 0 or 1. On booting, an Exchange Server always has this bit set to 0. It will never mount databases whilst the value of the bit is 0, and it will never change the value to 1 unless it can communicate with another server in the DAG that already has the value set to 1.

This communication of the value of the DAC mode bit is known as the Datacenter Activation Coordination Protocol (DACP).

Enabling DAC mode

To enable a DAG with two or more nodes to utilize the DACP protocol, set the DatacenterActivationMode property of the DAG to DagOnly.

# Enables DAC mode

Set-DatabaseAvailabilityGroup “DAGName” -DatacenterActivationMode DagOnly

To determine if a DAG is running in DAC mode use either of the following.

# A value of DagOnly means DAC mode is operational for the named DAG.

(Get-DatabaseAvailabilityGroup DAGName).DatacenterActivationMode
Get-DatabaseAvailabilityGroup DAGName | FL DatacenterActivationMode

If you use the Exchange 2013 Server Role Requirements Calculator (http://aka.ms/E2013Calc) and use the scripts that you can export from the Distribution tab of this workbook, you are given the option of enabling DAC mode as part of this export (see Figure 2-1). As long as you have configured values such as server names and other DAG settings in the calculator, the script export process will create a CSV file called DAGInfo.csv that will be used by the CreateDAG.ps1 script. When you are ready to create the DAG in your Exchange Server deployment, and your disks are configured as required by the Diskpart.ps1 script (also from the calculator), then the CreateDAG.ps1 script will make the DAG and set the settings on it for you. These settings include the maximum mounted databases, witness shares, and server names.

Figure 2-1

Figure 2-1 The Export DAG List dialog box showing the DAC Mode drop-down list with DAGOnly selected

Other scripts that can be exported from the calculator will create the databases and the database replicas for you across all of the servers in the recommended distribution.

DAC mode for DAGs with two members

In a DAG with just two members you need to take into account special considerations when enabling DAC mode because the DACP bit is not enough to protect against split-brain scenarios.

In a DAG with only two members and DAC mode enabled, the boot time of the witness server (which would be in the same site) and the time that the DACP bit was set to 1 are compared. If the time that the DACP bit was set to 1 is earlier than the boot time of the witness server, it is assumed that all of the servers in the site were rebooted, and thus an event has occurred across the site to cause a site-wide outage.

If the DACP bit was set at a time later than the witness server’s boot time, it is assumed that the DAG member had a reboot event that was not related to a site outage.

If the reboot event is not related to a site outage (in other words, the time the DACP bit was set is later than the witness server boot time), the DAG member is allowed to mount databases.

Other benefits of DAC mode

As well as preventing split-brain condition in a DAG, the DAC mode setting allows the use of Exchange Server cmdlets to perform datacenter switchovers. The cmdlets that can be used include the following:

  • Stop-DatabaseAvailabilityGroup This cmdlet is used to remove a member from the DAG or to remove an entire site from a DAG; like when you need to switch service over to the secondary datacenter. This cmdlet can be run only when the DAG is configured with a DatacenterActivationMode value of DagOnly. In the event of server or site failure (rather than planned switchover), you must use the ConfigurationOnly switch to update the state of the DAG members in the Active Directory when the actual servers are unavailable.
  • Restore-DatabaseAvailabilityGroup This cmdlet brings the DAG online in a switchover event. You need to mention the site name (-ActiveDirectorySite) and optionally the alternative witness server and directory if these were not already configured as part of the DAG. In the event of a site outage, this cmdlet would be used after a failed site is removed from the DAG using Stop-DatabaseAvailabilityGroup. This cmdlet forcibly evicts from the DAG those servers on the StoppedMailboxServers list and thus resets the requirements for quorum. This allows the remaining DAG members to establish quorum, mount databases, and provide service. If there is an even number of surviving DAG members, or if there is only one surviving DAG member, then this cmdlet configures the DAG to use the alternative file share witness. A file share witness is required when the cluster has an even number of nodes (to provide a tie-breaking 3rd vote).
  • Start-DatabaseAvailabilityGroup This cmdlet is used to restore service in a recovered datacenter after the datacenter switchover to the secondary datacenter has occurred, and now the recovered datacenter is ready to take part in providing service again. After running this cmdlet, the servers that had been evicted from the cluster by the Restore-DatabaseAvailabilityGroup cmdlet can be restored to service either by bringing all of the servers in a DAG in a given site online, or by specifically mentioning the server to bring online, should not all of the servers in the site be ready and available to resume service.
  • Move-ActiveMailboxDatabase This cmdlet is used to bring the active database from one server to another. In the scenario of site recovery it would be used to move the databases that are active in the secondary datacenter back to being active in the primary datacenter. It is used to finish recovery of Exchange Server following the resumption of service in the restored primary datacenter.
  • Update-MailboxDatabaseCopy This cmdlet is used to resume replication between the active and a selected passive copy after the service has been restored. Note that this cmdlet can also be used to initially seed a new database copy and is not just used when restoring service after an outage.

Given customer node requirements, recommend quorum options

Quorum is a feature of the cluster service in Windows Server. The cluster service is used to provide the server-up/server-down management of the Exchange Database Availability Group (DAG). At a simple level of description, the cluster nodes all communicate with each other and share a configuration that they all store, called the quorum database. One cluster member is deemed the owner of this database in the case of conflicting information in the copies of the quorum database.

Exchange Server uses a configuration of the cluster service known as Node and File Share Majority. In this configuration the quorum database is stored on all servers and replicated from the server with the writable copy of the quorum to all of the other servers in the cluster. The cluster can remain up and running and providing service if a majority of nodes of the cluster are available. The majority is calculated by taking the number of nodes in the cluster, halving this number, and adding 1 to the result. Because a DAG can have up to 16 nodes, the number of nodes in the cluster determines how many nodes need to be running for the cluster to be available for service. In the case of Exchange Server, if the cluster has majority it can mount databases and service requests to the data in the mailboxes from the other services within Exchange (such as transport and client access).

Table 2-1 shows you how many nodes need to be up in a cluster based upon the total number of nodes in the cluster. If this or more nodes are online, it is said that the cluster has majority, or it can also be said that the cluster has quorum.

Table 2-1 Number of nodes that need to be up to have majority

Nodes in cluster

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

NEEDED FOR MAJORITY

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

You can see from Table 2-1 that as you increase your node count, the number of servers that you need to keep online does not increase to the same pattern. Thus if you had a DAG with three servers in it and added a fourth, you would not have increased your resilience because you could still only sustain a single node failure

The file share witness

To avoid this issue of even numbers of nodes having the same resiliency as one less server in the cluster, Exchange Server changes the quorum type depending upon the number of nodes. When you have an even number of nodes in the cluster, the cluster is running Node and File Share Majority, and when you have an odd number of nodes in the cluster, the quorum type is Node Majority.

You can see the type of quorum your DAG is running by using Get-Cluster DAGName | Get-ClusterQuorum | Format-List in PowerShell, as shown in Figure 2-3. The same figure shows the output of Get-DatabaseAvailabilityGroup E15-DAG-RED | Format-List Servers, which shows that the DAG has three nodes in it and therefore the cluster reports that it’s QuorumType is NodeMajority.

Figure 2-3

Figure 2-3 PowerShell output showing DAG and cluster properties

When you have an even number of Exchange Servers in the DAG, Exchange Server changes the QuorumType to NodeAndFileShareMajority, and an additional server that hosts a file share votes on the majority of the cluster. Note that the role of the file share in terms of voting only comes into play when there is not enough servers online in the cluster to provide majority on their own. Thus if you had a four node DAG and all four nodes where online, though you would be in a Node and File Share Majority cluster, you would not need the vote of the file share to see if you have quorum. If two of the servers in this four node DAG were offline or unavailable (the other side of a disconnected WAN for example), the vote of the file share witness server would come into play.

The file share witness is any Windows Server computer that can host a file share that is in the same forest as the DAG, though the recommendation is a server running the CAS role because all of the correct permissions are already in place for Exchange to create the folder, share, access, and lock a file inside the share when it needs to.

The file share witness (and an alternative file share witness in another site for when you have site failover) are set when you create the Database Availability Group. For example, the following cmdlets create a DAG called DAG1 with both the file share witness and alternative file share witness set.

// Creating a Database Availability Group and specifying all of the witness servers

New-DatabaseAvailabilityGroup DAG1 -WitnessServer FS1 -WitnessDirectory C:\DAG\DAG1
Set-DatabaseAvailabilityGroup DAG1 -AlternateWitnessServer DRFS1 -DatacenterActivationMode DagOnly

The WitnessServer and WitnessDirectory parameters are valid in the New-DatabaseAvailabilityGroup cmdlet and in the Set-DatabaseAvailabilityGroup cmdlet. The AlternateWitnessServer parameter can only be set in the Set-DatabaseAvailabilityGroup cmdlet.

You are not required to pick a witness server, and if you do not, a CAS server in the same site that you are running the New-DatabaseAvailabilityGroup cmdlet in will be selected. This CAS server, if it coexists with a mailbox role server, cannot be a member of the DAG. If you have an Exchange Server design that leads to a single DAG and co-located server roles, you need to pick your own file share that is not one of the Exchange Servers in your deployment. Exchange Server will not auto-select an alternative witness server.

If you need to pick a file server that is not an Exchange Server, you need to add the Exchange Trusted Subsystem (ETS) universal security group into the local administrators group of that server. If the file server is also a domain controller, you need to add the Exchange Trusted Subsystem universal security group into the BUILTIN\Administrators group on any domain controller in the domain, and wait for replication to complete. The Exchange Trusted Subsystem group can be found in the Microsoft Exchange Security Groups OU in the root domain of the forest.

Once the file server is selected from an available CAS, or you specify the server during the creation of the DAG,1 or when you change the DAG settings, Exchange Server creates the directory specified or uses the default directory path and shares this folder. Multiple DAGs can use the same witness server, but they must use a different path for the witness directory.

The file share witness is used when the cluster needs to determine if it has quorum and there are not enough nodes online to achieve this. The cluster name owner, or the Primary Active Manager (PAM) in the DAG, will attempt to lock a file on the file share witness. If they are able to, this counts as an additional vote, and so in the case of an even number of servers in the DAG, when half the servers are online the locking of the file share witness means the site that contains the PAM will have an additional vote. Typically this means that they will achieve majority.

Figure 2-4 shows a four node DAG across two sites. The file share witness is located in the primary site, and all four servers are up and have mounted databases.

Figure 2-4

Figure 2-4 Four node DAG with all servers online

In Figure 2-5, the file share witness vote means that one side of the cluster can reach majority and mount databases. The other side of the DAG in the secondary datacenter cannot reach majority, and so the cluster service stops and the databases dismount.

Figure 2-5

Figure 2-5 Four node DAG after a WAN outage

Cluster features in the operating system

Exchange Server 2013 can run on Windows Server 2008 R2, Windows Server 2012, and Windows Server 2012 R2. For the two 2012 versions of Windows Server, the Standard Edition is sufficient as this contains the failover clustering feature. For Windows Server 2008 R2, a DAG requires Enterprise Edition because the Standard Edition does not contain the failover clustering role.

In Windows Server 2012, support has been added for dynamic quorum and dynamic witness. In Windows Server 2012 R2, these features are enabled by default.

Dynamic quorum is a feature where the node count for determining majority is not fixed to the number of nodes in the cluster, but the number of nodes that were online at the point of calculating quorum. In Windows Server 2008 R2, the number of nodes that are part of the quorum calculation are as described in the previous section, which means that the number of nodes for quorum equals the number of servers in the cluster. In Windows Server 2012, a cluster node loses its vote if it goes offline and regains it when it comes online. Therefore, a DAG with three nodes on Windows 2008 R2 has to have two nodes up to keep majority. The file share witness is not used because there is an odd number of nodes. But the same DAG running on Windows Server 2012 or 2012 R2 can sustain the loss of two nodes one after the other and keep majority. It does this via dynamic quorum. The loss of one server majority is recalculated and the DAG becomes a two node DAG with file share witness. That is three votes, and as long as either two nodes, or the witness server and the single DAG node are up, Exchange can keep mounting databases. Simply put, if the cluster experiences a failure that allows it to remain up and maintain quorum, then dynamic quorum can recalculate the quorum. This allows the cluster to survive additional failures and maintain quorum (more than it could without dynamic quorum). However, if your initial failure is widespread enough to cause quorum loss, the cluster will go down. So dynamic quorum really helps when you have cascading failures.

Dynamic quorum does not help in a scenario where you would lose quorum. For example, in a single site DAG with three nodes and you lose two nodes at the same time, you go from three votes (no file share witness) to one vote. One vote is not enough to maintain quorum.

This allows for a scenario called last man standing where you can progressively loose nodes and thus votes until you have one server remaining. Note that this one server might not host all of the databases or be sized to host the load of all the user connections, and so even though the cluster is up and one Exchange Server is up, not all of your databases might be up.

If you have two nodes left, dynamic quorum removes the vote from one node. One node will have a vote and one node will not. Use Get-ClusterNode ClusterName | Format-Table Name, DynamicWeight, State to determine which node has the vote, and do not shut that node down. If you do, the other node’s cluster service will go offline as it does not have a vote. The cmdlet will report DynamicWeight = 1 for the server that has the vote.

Quorum scenarios

Now that you have an understanding of the different scenarios where a file share witness will come into play, and that having quorum (or majority) is vital to maintaining service, we will look at some customer scenarios and best choices.

We will look at the following four scenarios:

  • Coho Winery has two servers at their vineyard and no other locations to host offsite servers.
  • Blue Yonder Airlines has a six node DAG with two of the servers in a secondary datacenter for high availability and site resiliency purposes.
  • Northwind Electric Cars has a well-connected office and factory located in different cities with users at both locations.
  • Humongous Insurance has 32,000 employees and need to have automatic failover to the secondary datacenter in the event of an outage in their primary New York datacenter.

Coho Winery single site two node This customer only has a single site but needs to ensure their email server is always available because they use it to receive orders from their suppliers. Therefore, they have a two node DAG. Their file share witness is located on their file server. They are running Windows Server 2012 R2 as their operating system for Exchange Server and so dynamic quorum comes into the equation when they need to shut down a server for maintenance, and they need to ensure they move the voting node and then keep that server online.

Blue Yonder Airlines This is a midsized regional airline company that has centralized all of its email in its head office in London. They have a disaster recovery datacenter near Oxford. They have a six node DAG supporting 4000 mailboxes with two of the nodes in the disaster recovery datacenter. All active mailbox databases are replicated to two other servers, one at the HQ for initial failover, and one at the DR site for second failover. They can lose one server and all of the mailbox databases remain on servers in the head office. At the loss of a second server in London, most mailboxes are activated on the two remaining servers in London but a small handful are activated in the secondary datacenter. The loss of two servers in London is acceptable because cross-site failover of mailbox databases is allowed. However, because they are running Exchange Server 2013 on Windows Server 2008 R2 they cannot use dynamic quorum. Therefore, because the cluster is a six-node cluster, it will make use of the file share witness. This means that with the loss of two servers in the primary site majority is maintained, or the loss of one server in the event that the WAN should also fail.

When two servers fail, but the WAN is online, there are four reachable servers. Four is a majority of six. When three servers fail, the file share witness vote becomes important because there are now three servers online (which is not a majority of six), but the extra vote from the file share witness means there are four votes. These votes are a majority, and so the cluster remains running. As long as the three failed servers are not the only servers a given database was replicated to, all mail databases remain online. If the WAN to the DR site goes offline, the two servers near Oxford will go offline (because they do not have majority). IT pros in the primary site must ensure that three Exchange Servers and the file share witness remain online while the WAN is offline.

Exchange Server Role Calculator for Exchange 2013 is a great tool for modeling these scenarios. For example, in Figure 2-6, you can see this six-server node with the WAN failed between sites (the WAN buttons reads, “Recover WAN”), and thus the two servers in the secondary datacenter are offline, and one server in the primary site has also failed. Even though Blue Yonder Airlines has lost the resources of half of their Exchange Servers, they are still able to maintain quorum and service email to their employees.

Figure 2-6

Figure 2-6 Exchange 2013 Storage Calculator used to model WAN and server failure scenarios

Northwind Electric Cars This customer has 2,000 staff, with 1,500 in the main office and 500 on the assembly line. There is good WAN connectivity between the sites even though they are located in different cities. Therefore they opted for two DAGs and a file share witness server located in each site (one for each DAG). The user mailboxes are located in the DAG that is in the majority at the local site. Quorum is maintained per DAG, and so each office is a separate failure domain.

If the WAN fails, the Exchange Servers that are in the secondary site for each DAG go offline. This results in some Exchange Servers in each location going offline, but because these servers are just passive copies and not used for transport shadow redundancy from the other servers in the site, there is no immediate impact to the users.

When a site fails, the mailboxes for that site could be activated in the other site if recovery of the primary site would exceed any required SLA. The other site and its servers would not be affected by the outage in the first site.

Humongous Insurance This customer is planning a migration to Exchange Server 2013 from Exchange Server 2007. As part of the migration, Humongous Insurance will make use of new technologies such as 8 TB SATA drives to reduce their storage costs in a JBOD deployment that is an eight node DAG across two datacenters. They will locate their file share witness in a third datacenter that has MPLS networking links to both the primary and secondary datacenters. Figure 2-7 shows the Distribution tab of the Exchange Server 2013 storage calculator with four servers in each site, and the file share witness in a third datacenter. The WAN in the primary datacenter has failed. The MPLS links that they have in place from the various user locations remain working, as does the link to the file share witness site and the secondary site. On the WAN failing, the secondary site can reach the file share witness and therefore is able to have five votes from a DAG of eight nodes. It is therefore able to keep majority even though the primary servers are all unreachable. The Exchange mailbox databases all mount in the secondary site automatically, and because the primary site is not able to reach the file share witness, it is only able to acquire four votes. Therefore, the cluster service in the primary datacenter does not get majority, and the databases dismount.

Figure 2-7

Figure 2-7 Modeling a WAN failure and automatic failover with a third site file share witness

Planning cross-site DAG configuration and configuring DAG networks

When you have a DAG that crosses a WAN boundary, there are a number of additional items to consider in your configuration. These are:

  • The number of DAGs you need
  • DAG IP addressing considerations
  • Configuring DAG networks and rediscovering DAG networks
  • Network compression

How many Database Availability Groups do you need?

There are a number of aspects to the question regarding the number of DAGs that you need. The most important aspect to consider is the location of your user population.

If you take a simple Exchange Server scenario like the one shown in Figure 2-8, you will see that there are users in a number of network locations. You will also see that their nearest network is Paris or Berlin, depending on their location. There are also users at each main office in Paris and Berlin. The servers are kept at the main office.

Figure 2-8

Figure 2-8 Exchange Server deployment in multiple offices with branch offices

In the scenario in Figure 2-8, there are users located in both Paris and Berlin. Each user connects to their mailbox which is located in the main office. If the four Exchange Servers in the scenario were placed into a single DAG, an outage of the Paris-Berlin WAN would result in all of the mailboxes being mounted in Paris because that is the location of the file share witness. This would result in a loss of access to their mailboxes for the users in Berlin, and the sites connected to it.

If the network topology for this company is such that direct connections from, for example Rostock, reached Paris by way of Berlin, then when the Paris-Berlin WAN goes offline Rostock cannot reach Paris. Even though their mailbox is online, they cannot access it from the office. Therefore, this scenario needs two DAGs. A Paris DAG with one or two servers in Paris depending upon load, sizing, and availability requirements, with a site-resilient copy in Berlin. This DAG would have a file share witness in Paris and a Berlin-homed DAG with the file share witness in Berlin. When the WAN fails, the Paris DAG will mount its databases in Paris, and the Berlin DAG will mount its databases in Berlin. This is known as an Active/Active (Multiple DAG) model and can be seen in Figure 2-9. This model avoids the WAN link between the two sites being a potential single point of failure, though it does require more servers than a single DAG for the same user count and mailbox size. If the WAN can be a resilient WAN, an Active/Active (Single DAG) model can be deployed where there are active mailbox databases mounted in each site. Because the WAN is resilient, an outage of part of the WAN results in an alternative route being used, and the mailbox databases staying online in their desired locations.

Figure 2-9

Figure 2-9 A two datacenter design where users are located at or networked to each datacenter

DAG IP addressing considerations

In a Windows Server 2008 R2 or Windows Server 2012 cluster, you need to have an IP address assigned to the DAG for each site that the DAG can operate in. You can either assign an IP address manually when creating the DAG in Exchange Management Shell, or allow DHCP to assign an IP address to the DAG. For auto assignment of IP addresses, you obviously need to have a DHCP server running in each site that the DAG is in, and available IP addresses at the time of failover. If you do not have a DHCP server or servers that can supply IP addresses to each site, or you cannot guarantee available IP addresses in the DHCP pool when failover occurs, you should manually assign a valid IP address for each site and ensure that nothing else uses that address by removing it from the DHCP pool or updating your documentation on available addresses.

In a Windows Server 2012 R2 cluster with Exchange Server 2013 SP1 or later, you can create a DAG that does not have an underlying cluster name or IP address requirement. This is known as a DAG without a cluster administrative access point. DAGs without administrative access points have the following features:

  • There is no IP address assigned to the cluster/DAG, and therefore no IP address resource in the cluster core resource group.
  • There is no network name assigned to the cluster, and therefore no network name resource in the cluster core resource group
  • The name of the cluster/DAG is not registered in DNS, and it is not resolvable on the network.
  • It is not required to pre-create the cluster name object (CNO) in the Active Directory.
  • The cluster cannot be managed using the Failover Cluster Management tool. It must be managed using Windows PowerShell, and the PowerShell cmdlets must be run against individual cluster members.
  • There is no computer object that needs to be created in the Active Directory for the cluster. This avoids a series of administrative tasks and removes potential issues if the object is accidently deleted.

Each of these features will impact the way you interact with the cluster, most notably in terms of the management tools used. But the features that remove the need for a network name (CNO) and IP address reduce the resources required by the cluster service, and therefore reduce the items that if they fail will cause the cluster to failover or go offline.

To create a DAG without an administrative access point, use the following in the Exchange Management Shell.

New-DatabaseAvailabilityGroup -Name DAG1 -WitnessServer EX4
-DatabaseAvailabilityGroupIPAddresses ([System.Net.IPAddress])::None

Configuring DAG networks and rediscovering DAG networks

The cluster that underlies the DAG requires at least one network that all the nodes share and that the clients of the DAG can use to reach the mailbox databases. Optionally you can have a second network in the cluster that Exchange will use for transaction log replication. Clients do not connect to the DAG via the replication network and so it is dedicated to the role of replicating transaction logs as fast as the network will allow.

It is important to consider if you need a replication network at all. In a cross-site DAG, the most likely configuration for the WAN between the sites is that you have a single WAN. Therefore, all cross-site communication is on this single connection even though you have two network cards in each server. Splitting the client and replication traffic onto separate networks in the LAN at each end, only to combine them on the WAN, means that in reality you have a single network that the replication traffic will cross.

If you do have two WAN links, a large enough user population, or dedicated network switching and trunking such that you really can take the replication traffic off of the network that the clients are on, then it is worth doing. For smaller networks or networks with capacity beyond their requirements and sufficient switching capacity, it is probably easier to keep the administration of Exchange Server simple, and keep to a single client and replication network.

If you have a second network for replication, it must be configured correctly. For Exchange Server this network should be configured as follows:

  • Not registered in DNS It is not reachable by clients, and so clients should not resolve this server on this network. This setting can be seen in Figure 2-10.

    Figure 2-10

    Figure 2-10 DNS settings for the replication network

  • Not having a default gateway You should have one default gateway on the server and this should be on the network card that all unknown routes use (the client network in our example). The replication network should have manual routes configured so that you can reach the replication network in the other sites.
  • The client or MAPI network is listed above the replication network in the network binding order The binding order can be changed from the Network Connections window. Press ALT to see the menu and then choose Advanced, and Advanced Settings. The Advanced Settings dialog box is shown in Figure 2-11. Note that Microsoft recommends that IPv6 is not disabled (see http://support.microsoft.com/kb/929852).

    Figure 2-11

    Figure 2-11 Setting the network binding correctly for a dedicated replication network

If you get the network card and client/replication network configuration correct, Exchange Server 2013 will auto-configure your DAG networks for you. The DAG networks are a configuration that helps Exchange understand how the network is configured. They are not physical networks or settings on the network card.

If your Exchange Servers have networks other than those that will be used for clients and replication, you will need to disable them in the DAG network. Examples include management networks and backup networks. If you have these additional networks you have to manually configure your DAG networks and then use the -ignorenetwork $true parameter in Exchange Management Shell to mark the network as not to be used by the DAG.

Network compression

When a DAG is created, Exchange Server will by default compress network traffic between the nodes when they go cross-site, but by default network traffic is not compressed within the site. Compression takes a little additional CPU to complete at each end, but reduces the traffic considerably. In places where you have limited bandwidth the extra CPU impact is acceptable as you usually have capacity, whereas you might not have capacity on the WAN.

There is typically not much point enabling compression on a LAN as the capacity is available for uncompressed traffic.

Compression is set as a property of the DAG and not the underlying DAG network. To check the compression of your DAG, use the following Exchange Management Shell cmdlet.

Get-DatabaseAvailabilityGroup E15-DAG-RED | Format-List NetworkCompression

The possible values for network compression are:

  • Disabled All intra-DAG traffic is not compressed regardless of DAG node location.
  • Enabled All intra-DAG traffic, including replication and database seeding is compressed.
  • InterSubnetOnly All DAG replication traffic (not seeding traffic) is replicated between subnets. Replication within the LAN is not compressed.
  • SeedOnly Only database seeding traffic is compressed.

Compression generates a saving of about 30 percent over uncompressed traffic and should be used inter-subnet unless you have WAN optimizers that can do a better job. Given that all of the traffic is typically unique, WAN optimizers are unlikely to achieve this and should be turned off for traffic on the replication port of TCP 64327.

Creating a DAG on Windows Server 2012 and 2012 R2 Servers

A cluster name object (CNO) is automatically created in the Active Directory by cluster service on Windows Server 2008 R2 when the DAG is created. This does not happen automatically on Windows Server 2012 or 2012 R2 and needs to be created manually. If you are making a DAG without an administrative access point, you do not need to pre-create a CNO because this type of DAG does not need one.

Objective summary

  • Datacenter Activation Coordination (or the DatacenterActivationMode parameter) is recommended when you have more than two servers in the DAG.
  • Ensure that you understand that two servers in a DAG should still have Datacenter Activation Coordination enabled, but be aware of the implications of restarting the file share witness at the same time as restarting a DAG member.
  • DAC mode allows you to use the Exchange cmdlets only to switch or failover to a secondary datacenter and to switch back again upon resumption of service without the risk of split-brain condition occurring in the cluster.
  • Quorum is both the database in the cluster that stores the cluster settings (it is the cluster hive in the registry), and a term that indicates that the cluster has majority.
  • Exchange Server manages the cluster type, changing it from NodeMajority to NodeAndFileShareMajority depending upon the number of servers in the cluster. NodeAndFileShareMajority is used when the cluster has an even number of nodes.
  • The file share witness is used to provide an extra vote to determine quorum in the event that only half of the nodes are reachable in an even numbered cluster. The file share witness contains a file called Witness.log that one Exchange Server, the Primary Active Manager (PAM), which is the server that owns the cluster resources, attempts to lock to ensure majority. Therefore, ensure that the PAM and the file share witness are both located in the primary site unless the witness server is in a third site.
  • It is recommended to keep your Exchange Server deployment simple as that reduces the risk of human error from causing a site or server outage. Therefore a single network for DAG replication and client traffic is recommended.

Objective review

Answer the following questions to test your knowledge of the information in this objective. You can find the answers to these questions and explanations of why each answer choice is correct or incorrect in the “Answers” section at the end of this chapter.

  1. One winter’s day a supplier-level power failure occurs impacting the primary site. Staff are sent home and your requirement is to bring Exchange Server online in the disaster recovery site. What are the steps to do this when the file share witness is located in the primary site and DatacenterActivationMode is set to DAGOnly? The alternative witness server had been configured when the DAG was originally set up.

    1. Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <PrimarySite> followed by Resume-DatabaseAvailabilityGroup -ActiveDirectorySite <SecondarySite>
    2. Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <PrimarySite> followed by Start-DatabaseAvailabilityGroup -ActiveDirectorySite <SecondarySite> -WitnessServer FS1
    3. Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <PrimarySite> followed by Restore-DatabaseAvailabilityGroup -ActiveDirectorySite <SecondarySite>
    4. Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <PrimarySite> followed by Restore-DatabaseAvailabilityGroup -ActiveDirectorySite <SecondarySite> -WitnessServer FS1
  2. You need to design a DAG layout for a company with 5000 staff located around the world. The company has three datacenters, one located in San Francisco, one in Madrid, and one in Tokyo. Mailbox sizes mean that one server in each site is sufficient to store the data and a second server is needed for high availability. The network infrastructure of the company allows all users to access all parts of the network via an MPLS cloud network, but the fastest links are to the geographically closest datacenter. If the MPLS network links fail, cross-site connectivity will be broken. What will this Database Availability Group (or groups) configuration look like to ensure a working solution in both active and failure scenarios?

    1. Create a single DAG with two servers in each datacenter as members of this DAG with the file share witness in Madrid.
    2. Create two DAGs, one with the PAM and file share witness in San Francisco and the other with the PAM and FSW in Tokyo. Add one server in Madrid to the San Francisco DAG, and the other server to the Tokyo DAG.
    3. Create three DAGs, one for each datacenter and expand the DAG to the nearest other datacenter where an additional server will be placed for site resilience for that primary datacenter of the DAG.
    4. Create four DAGs. One in each datacenter with two servers in it, and then a fourth DAG that holds the replica servers and is distributed geographically across all of the datacenters.