Clustering Exchange Server
Custering improves an Exchange Server system's availibility, but isn't a fail-safe solution.
May 31, 1999
A year's experience
I am a cluster bigot. I developed this bias when I worked with VMScluster systems, which provide a secure, robust, and reliable platform for messaging deployments. When I began working with Microsoft Exchange Server, I wasn't able to cluster the messaging system because it lacked cluster support until Microsoft released Windows NT Server 4.0, Enterprise Edition (NTS/E 4.0) and Exchange Server 5.5, Enterprise Edition (Exchange 5.5/E) in November 1997. But when Microsoft made clustering available for Exchange, the company sealed the fate of my Exchange Server machine.
After searching for suitable equipment for a couple of months, I put my Exchange Server cluster into production in February 1998. For the most part, the cluster hums away without requiring many administrative interventions, but the implementation has hit some bumps along the way. I've learned a lot about creating, configuring, and maintaining an Exchange Server cluster in the past year.
Hardware Configuration Challenges
Exchange Server operates within Microsoft Cluster Server (MSCS) on an active/standby basis. Thus, an Exchange Server cluster must consist of two machines: a primary node and a secondary node. The primary node runs Exchange Server (i.e., it's active) under most conditions. The secondary node doesn't run Exchange Server until the primary node has a problem. Then, a cluster transition moves the set of Exchange services to the secondary node and makes it the active node. The secondary node takes on Exchange Server operations from the point of failure.
The first challenge in building an Exchange Server cluster is assembling suitable hardware. Microsoft recommends that you build a cluster using only hardware from the clustering categories of the Hardware Compatibility List (HCL—http://www.microsoft.com/ hwtest/hcl). The HCL lists all the systems that Microsoft has tested and certified to run MSCS, and Microsoft constantly updates the list to keep it current. Screen 1 shows a sample of the HCL's clustering-compatible systems.
Coming up with the right equipment to build an Exchange Server cluster is sometimes difficult. After looking around my department for HCL hardware, I built my cluster from hardware that was close to the list's requirements but not quite compliant. (Justifying hardware purchases is tough when you have noncompliant servers available.) However, I recommend sticking to the HCL in all production environments.
I call the cluster DBOIST-MSXCL. The cluster's primary node is DBOIST-CL0, a Digital Prioris MX 6200 that has two 200MHz Pentium Pro processors and 256MB of RAM. The cluster's secondary node is DBOIST-CL1, a Digital Prioris XL 5200DP that has two 200MHz Pentium processors and 256MB of RAM. Each node has an internal hard disk that runs NT, and the servers share a SCSI array that I based on a Compaq StorageWorks RA310 controller. The array holds a RAID 5 set that stores the Exchange Information Store (IS) and Directory store and a RAID 1 set that stores the cluster's transaction logs. A separate disk holds the Exchange binary files. At any time, the array is available only to the active node.
This configuration would never pass Microsoft's formal hardware-compatibility tests, because the two nodes aren't identical. Many applications store in the Registry parameters that vary depending on a system's hardware configuration. For example, the Exchange Performance Optimizer wizard analyzes a system's hardware and calculates parameters that it bases on the hardware analysis. The wizard writes these parameters to Registry keys (e.g., HKEY_LOCAL_MACHINE SYSTEMCurrentControlSetServicesMSExchangeISParametersSystem). The Registry values that Performance Optimizer calculates for a particular hardware configuration don't necessarily work for other systems. If the servers in a cluster are identical, the parameters for applications such as Performance Optimizer are appropriate for both of the cluster's nodes.
Easy Exchange Management
Before you install Exchange Server on a cluster, you must create a resource group (i.e., a collection of dependent or interlinked resources that form a management unit) in MSCS's Cluster Administrator program. Then, when you install Exchange Server, the setup program adds components such as the System Attendant, the IS, the Directory Service (DS), and the Message Transfer Agent (MTA) to the resource group. The setup program also adds to the resource group the network share that holds message tracking logs, and the resource group specifies the IP address and network name that represent the cluster's virtual server. Without these components, Exchange Server can't function in a cluster; the application can move between the two physical computers during cluster transitions because the server that Exchange recognizes is virtual.
Before I installed Exchange Server, I created a resource group that I named Exchange. Screen 2 shows Cluster Administrator's view of the resource group after I ran the Exchange Server installation program. You can see in Screen 2 that all the Exchange resource group's resources are active on the server DBOIST-CL0.
Installing Exchange Server on a cluster is different from a typical Exchange installation. Exchange Server's setup program detects whether a server is part of a cluster and installs the Exchange binary files in the right place. However, you must install Exchange Server separately on each node. First, you must install the messaging system on the cluster's primary node. The second installation updates the secondary node with details about the Exchange Server configuration, such as the Registry settings that point to files and set values for other parameters.
You don't use the names and IP addresses of the two physical computers that form the Exchange Server cluster during regular administrative activities. You perform all administrative activities using the IP address and network name you allocate to the cluster. Screen 3 shows the Microsoft Exchange Administrator program's view of the two servers in my Exchange organization's Dublin site: DBO-EXCHANGEIST and DBOIST-MSXCL. DBO-EXCHANGEIST is a standard physical server, and DBOIST-MSXCL is my cluster. Exchange Administrator displays the servers identically; the software doesn't differentiate between physical and virtual servers. You can carry out the same operations on either type of server, and when you administer a virtual server in Exchange Administrator, you can't tell that the server is virtual.
The day-to-day management tasks for an Exchange Server cluster are similar to the same management tasks for a single-server system. Nevertheless, managing a cluster is different from managing one server. For example, you can start and stop a virtual server's services only through Cluster Administrator. The only problem I've encountered using standard Exchange Server administration procedures on my cluster is that changing passwords for the Exchange service account is confusing. Exchange Server's clustering documentation didn't tell me not to use the standard Exchange Server method for changing the password, so I changed the password on the Service Account Password tab of the property sheet for the Site Configuration object. The change was ineffective, and I ended up having to change the password manually.
Exchange Connectors That Don't Cluster
Microsoft's efforts to add clustering support to Exchange Server didn't reach all the product's connectors. Some Exchange Server connectors, such as the Dynamic RAS connector, simply can't run on a cluster. Other Exchange Server connectors, including the Microsoft Mail and Lotus cc:Mail connectors, work fine on an Exchange Server cluster's active node but don't support cluster transitions, so you must manually reconfigure them after Exchange switches to the secondary node. Usually, this manual reconfiguration involves updating the connector properties to include the name of the computer on which the connector is active.
The Microsoft articles "XCON: MTA Associations Fail On MSCS Installation" (http://support.microsoft.com/support/ kb/articles/q186/7/56.asp) and "XCON: Using X.400 Connector with TCP/IP in Cluster Environment" (http://support.microsoft.com/ support/kb/articles/q169/1/13.asp) explain problems that the original Exchange 5.5/E has with clustering X.400 connectors. Exchange Server 5.5 Service Pack 1 (SP1) solves these problems. The cluster I use supports X.400 connectors to other Exchange sites worldwide. However, when you configure an X.400 connector for an Exchange Server cluster, you need to make sure to use the cluster's Fully Qualified Domain Name (FQDN) or IP address. Screen 4 shows the configuration properties of an X.400 connector that points to DBOIST-MSXCL. As you can see, the other machine's stack points to the cluster's FQDN in the same way that it would point to a single-node server. Clusters don't support the X.400 connector over X.25 links.
Slow Cluster Transitions
Exchange Server performs a cluster transition when it determines that a resource needs to move from the active node to the standby node. In the early days of Exchange Server clustering (on systems running Exchange 5.5/E without SP1 and NTS/E 4.0 with SP3 and some hotfixes), transitions were often slow. My cluster's transitions were sometimes so slow that I powered off the primary node server to force the cluster to detect whether something was wrong. Transitions took place service by service, so moving all the services to the secondary node took a long time (more than 10 minutes on most systems). And the larger the IS, the longer Exchange Server took to failover.
Exchange Server is a complex application, and Microsoft worked extensively to address slow transitions. Some requirements of failover are unavoidable; one unavoidable requirement is the Exchange Server databases' need to replay transaction logs to ensure that they're consistent when the primary node restarts. Microsoft has concentrated on reducing remote procedure call (RPC) timeouts so that the services restart faster. This work has led to a noticeable increase in the speed of clustering transitions in NTS/E 4.0 with SP4 and Exchange 5.5/E with SP2.
An Exchange Server cluster transition forces the IS to restart. As the IS restarts and before it becomes fully functional, it checks the transaction logs for details of transactions that Exchange Server hasn't yet committed to the database. Users notice this delay because they can't access their mailboxes until the IS fully restarts. SP2 speeds up the way that Exchange Server processes transaction logs, and this change reduces the length of the IS's restart. If you're running a cluster, I suggest that you upgrade to SP2.
The Sad Tale of Third-Party Software
Few Exchange Server systems run only Exchange. Administrators use add-on products to complete their Exchange messaging environment. Products that complement Exchange Server include backup software such as Computer Associates' ARCserve and Seagate's Backup Exec, virus checkers such as Trend Micro's ScanMail for Microsoft Exchange and Computer Associates' Inoculan for Windows NT, document management systems such as Compaq Work Expeditor, and workflow products such as Eastman Software's Work Management for Exchange. However, I've found that many third-party products either won't install, don't work, or run poorly on an Exchange Server cluster.
I don't blame vendors for not giving clusters the attention they deserve. After all, clustered Exchange Server systems probably represent less than 1 percent of the total number of servers that run Exchange. I don't see many clusters running in production environments.
When you consider clustering your Exchange Server system, you can't assume that your favorite third-party product will work in a cluster. Ask the vendor to certify that its product works, then test the product before adding it to your production environment. The purpose of clusters is to provide redundancy and resilience to a messaging system, so don't try to make a third-party product work on an Exchange Server cluster if the product will reduce the overall robustness of the system. Most vendors will no doubt update their products to support clusters, but today trying to add third-party products to an Exchange Server cluster is frustrating.
Not a Panacea
Clustering won't fix all your Exchange Server system's problems. If a system runs out of disk space, clustering will keep the system's Exchange Server users from experiencing lengthy delays in service, but it won't solve the underlying problem. If the hard disk that your primary node uses for transaction logs or for MTA work files has less than 10MB of free space, Exchange Server shuts down the affected Exchange service and performs a cluster transition. But when the service starts again, Exchange Server will face the same situation (lack of space) and terminate the service again. Clusters don't save you from performing systems management tasks such as checking for sufficient disk space.
In addition, clustering can't prevent hard disk failures. A set of shared disks is at the heart of any cluster; in an Exchange Server cluster, the shared array holds the binary files, transaction logs, and databases. A problem with a shared disk affects both nodes in the cluster, so shared disks create a potential single point of failure for a cluster. My Exchange Server cluster reminded me of this in a big way.
The disk that holds my Exchange Server cluster's transaction logs and binary files failed. This failure wouldn't have been a problem for me if I had mirrored the disk. However, I broke two cardinal rules of Exchange Server cluster management: Always mirror the transaction logs, and keep transaction logs away from the binary files and databases. Resolving this problem didn't cause me half as much pain as the embarrassment of getting myself into the situation.
The Decision to Cluster
Before you implement a cluster, think about whether clustering Exchange Server will provide what you're looking for. You're probably interested in clusters because you want to create a highly resilient system, but some administrators argue that clustering Exchange Server adds little resilience to the messaging system. Controller or disk failures account for most hardware problems that Exchange Server systems encounter, and clustering can't prevent these problems. Clustering increases a system's availability in the event of a software failure, but such failures are rare.
In addition, clustering is expensive. You must purchase two licenses for both NTS/E 4.0 and Exchange 5.5/E, two heavy-duty servers, and an expensive disk array to create one virtual server. And when you cluster Exchange Server, one of your servers is idle at all times. My department uses our cluster's standby node for tasks such as online backups and file and print sharing, but the clustering setup is inherently wasteful.
You might better spend your money buying two servers and dividing your user community between the two machines in the same site. This configuration would ensure that any server or hard disk problem would affect only half of your users. Although this configuration might require more disks than a cluster would require, you could buy the standard edition of the NT license for both servers. (You'd probably still want Exchange 5.5/E for that version's nonclustering benefits, such as the unlimited store and, perhaps, the X.400 connector.) However, this configuration would leave you with two servers to back up and maintain, potentially two sets of licensing fees for third-party products, and two servers to upgrade when new versions and service packs appear.
Some companies find that clustering Exchange Server mailbox servers is worth the price because of the increased availability that clustering provides an Exchange Server system. Mailbox servers need resilience, because people notice when they can't get their mail.
However, I don't think that using a cluster as a connector server between Exchange sites is often cost-effective. Administrators can usually afford to take a connector server offline to install a service pack or hotfix during working hours without affecting users, so clustering a connector server doesn't provide as much return on the NTS/E 4.0 and Exchange 5.5/E investment as clustering a mailbox server.
My department has had a positive experience with its Exchange Server cluster, which acts as both a mailbox server and a connector server. Despite the fact that we built the cluster from non-HCL-compliant parts, it has provided excellent service since we installed MSCS. The shortage of third-party products that run on Exchange Server clusters is frustrating at times. But since we set up the hardware, mastered the extensions to NT that MSCS introduces, and understood how Exchange Server functions in a cluster, the server has run smoothly. Clustering Exchange Server won't do you any good if you don't manage your server correctly. But if you put the necessary effort into building and maintaining your server, I think you'll enjoy Exchange's clustering experience.
About the Author
You May Also Like