Clusters for Everyone
Examine some situations that call for clustering and choose the right clustering product to save the day.
May 31, 1997
The other day, a hardware failure brought down our Exchange server. This failure created a panicin our user community because we consider email availability as important as a dial tone. Had webeen using a Windows NT cluster, we users would never have noticed the problem. By providingcontinuous availability through replication, an NT cluster could have saved us a lot of frustrationand prevented the loss in productivity.
Today's NT clustering solutions solve one business computing problem: availability. Byreplicating data, applications, and even entire systems, clustering lets two or more systems watcheach other's back and take over the workload (user connections, applications, and services) in caseone system fails. This article will review the types of clustering solutions currently available,categorize clustering solutions, and illustrate what types of business computing problems clusteringcan help solve now.
So What's a Cluster Anyway?
A cluster is a group of whole, standard computers that work together as a unifiedcomputing resource and that can create the illusion of being one machine, a single system image.(With NT clusters, the term whole computer, which is synonymous with node, means asystem that can run on its own, apart from the cluster. If you're not familiar with clusteringterms, you can refer to "Clustering Terms and Technologies.") This unifiedcomputing resource ensures availability because any node can take on the workload of any other nodethat happens to fail.
Clusters come in three configuration types: active/active, active/standby, and fault tolerant.Let's examine each of the three types of cluster configurations:
Active/active: All nodes in the cluster perform meaningful work. If any node fails,the remaining node (or nodes) continues handling its workload and takes on the workload from thefailed node. Failover time is between 15 seconds and 90 seconds.
Active/standby: One node (the primary node) performs work, and the other (the standby,or secondary node) stands by waiting for a failure in the primary node. If the primary node fails,the clustering solution transfers the primary node's workload to the standby node and terminates anyusers or workload on the standby node. Failover time is between 15 seconds and 90 seconds.
Fault tolerant: A fault-tolerant cluster is a completely redundant system (disk and CPU) whosegoal is to be available 99.999 percent of the time. That goal translates to fewer than 6 minutes ofdowntime per year. Both nodes of a fault-tolerant cluster simultaneously perform identical tasks;the nodes' workloads are redundant. Failover time is less than 1 second.
To illustrate the definition of a cluster, let's say you have users doing file and print onServer A and another group of users accessing an Oracle database on Server B. Servers A and B arenodes in an active/active cluster. If Server A fails, Server B continues handling its workload andpicks up Server A's workload. The users accessing the Oracle database do not notice any change intheir service; the users doing file and print at most experience a short delay.
NT Clustering Solutions
As the need for availability becomes ever more crucial in the NT environment, many third-partyvendors and Microsoft have introduced or are about to introduce clustering solutions for NT. To helpyou evaluate these clustering solutions, let me briefly explain Microsoft's clustering initiative,Wolfpack, and categorize its capabilities in comparison with those of some prominent third-partyclustering solutions. (For reviews of several individual clustering products, including Wolfpack,see Lab Reports.)
Wolfpack
Wolfpack is Microsoft's two-node, active/active clustering solution and set of APIs for NT.Wolfpack's purpose is to provide high availability to your NT Server environment.
Wolfpack will have an effect in several significant areas. First, you can expect all servermanufacturers who want to reach NT customers to offer Wolfpack-based clustering support this year.Even a year before its release, Wolfpack had the backing of Digital Equipment, Compaq Computer,Tandem, Intel, Hewlett-Packard, NCR, and IBM.
Theoretically, Wolfpack will work on any two Intel-based or any two Alpha-based servers, butyou can't mix Intel and Alpha. However, in practical terms, the number of supported systems will bevery restricted because to get on the Wolfpack Hardware Compatibility List (WHCL), each manufacturermust test complete configurations (system, disk subsystem, and SCSI adapter) for compatibility. Thisapproach stands in contrast to NT's existing Hardware Compatibility List (HCL), which letsmanufacturers list individual system components. For the WHCL's first release, Microsoft will leteach manufacturer list only two configurations. Microsoft will support Wolfpack only for systems onthe WHCL, so don't try to build your own Wolfpack clustering solution. Although these requirementswill initially limit the selection of Wolfpack-compliant configurations, the WHCL will grow overtime.
The second area that Wolfpack will affect is storage. In a Wolfpack-based solution, you needonly enough storage in your servers to run NT Server and Wolfpack. A disk subsystem that bothservers share will provide the bulk of your storage. As a result of this approach, servermanufacturers will want to differentiate themselves by improving their storage performance. Thosemanufacturers that don't have their own subsystems will have to obtain them from storage providerssuch as CMD Technology, Data General, and BoxHill Systems. Some manufacturers, such as Compaq, willuse clusters as a way to promote fibre-channel based storage solutions because fibre-channel storagehas significant advantages over SCSI, in both throughput and cable length.
Third, Wolfpack will affect server applications. Wolfpack is not only a clustering solution,but a set of APIs. These APIs let developers make their server application "cluster aware."Such awareness could mean easier installation in a clustering environment, better failovercapabilities, and the ability to scale an application beyond one node. For example, Microsoft plansto use the Wolfpack APIs with its Transaction Server to let two nodes work on the same SQL Serverdatabase query. This technology combination is fundamental to Microsoft's plans to provideenterprise scalability.
The Wolfpack APIs have been available to developers for only a short time, so only a fewapplications will initially be available. However, as the adoption of clusters becomes morecommonplace, the demand for cluster-aware applications will increase as well. Expect Microsoft'sBackOffice applications to become cluster aware during 1997 and 1998.
Fourth, Wolfpack will have an impact on other NT clustering solutions. Many competing NTclustering solutions have already declared support for the Wolfpack APIs. This API support will letMicrosoft's competitors support Wolfpack cluster-aware applications and still provide enhancedfunctionality over the Wolfpack solution.
Finally, the price and availability of Wolfpack-based solutions will drive NT cluster solutionsinto the mid-to-low end of the server market. The price of Wolfpack-based solutions is about 20percent of the price of solutions available for UNIX. This pricing alone will make companies thathave never considered clustering take a look at it. In addition, the availability of Wolfpack-basedsolutions from many vendors will create competition, improve awareness in the market, and helpstimulate demand in the mid-to-low markets that they serve.
Third-Party NT Clustering Solutions
Wolfpack isn't the only game in town. In fact, several solutions are more mature than Wolfpack,offer additional functionality, and solve different problems. Table 1 lists some prominent solutions(including Wolfpack) and categorizes the type of clustering solution they offer, their data-handlingstrategy, their hardware interconnect, and their flexibility in hardware choices. (For a summary ofinformation about the clustering solutions reviewed in this issue, see "Clustering SolutionsFeature Summary," and for information about other clustering solutions, see "Buyer'sGuide to Clustering Solutions.") Let's look at some of the categories in Table 1, andthen we can apply our knowledge of clustering solutions to some real-life scenarios to determinewhat solution is best for a given situation.
Data handling. NT clusters use one of three data-handling methods: mirroring,switching, and redundancy. In mirroring, one node replicates another node's data. Octopus,NSI, and Vinca rely on this technique. With switching, each node has its own disk source,which may be RAID or just a bunch of disk (JBOD). Both nodes share a SCSI bus, which lets them takeover the failing node's disk. Finally, with redundancy, the clustering solution writes datato both nodes simultaneously.
Hardware interconnect. The hardware interconnect is the required physical linkbetween the nodes in the cluster. Several solutions require proprietary connection devices. Othersolutions use any type of TCP/IP-supported connection, such as Ethernet.
Hardware flexibility. The hardware flexibility column in Table 1 ratesavailable choices for nodes. For example, Stratus' solution works on only Stratus hardware and istherefore rated poor in the flexibility column. Wolfpack requires manufacturers to list completeconfigurations--not components--on the WHCL, and therefore, receives a rating of fair. Octopus willwork with any NT-based servers (Intel, Alpha, MIPS, PowerPC), and therefore, is rated excellent.Vinca will work with any two NT-based servers (Intel only) and therefore, is rated good.
Scenarios
A variety of clustering solutions can solve availability problems in an NT environment. Thepurpose of the following scenarios is to show how you can apply clustering solutions to solvespecific problems.
SITUATION 1
Expanding Your File and Print Server
Problem: Your company has a single-processor Pentium-based NT Server that you use forfile and print, and it is running out of steam. Your applications include a heavily used multi-userAccess97 database and Office97. You have to reduce downtime, especially with the Access97 database,which has become critical.
Solution: If you buy an additional server, you can use a mirror-based solution such asOctopus to connect the two servers into a cluster. Now you can ease your capacity crunch by puttingyour Office97 files on one server and Access97 on the other server. At the same time, you canreplicate critical data between the servers and create a fault-resilient environment.
Could you use Wolfpack in this situation? You could, only if your new configuration is on theWHCL, which is highly unlikely right now. Also, Wolfpack requires a SCSI-based disk subsystem, whichis an extra purchase.
SITUATION 2
Setting Up a Web-based Storefront Using Merchant Server
Problem: Your company has decided to take orders and payments over the Internet. Foroptimum performance, you decide to run Merchant Server and Internet Information Server (IIS) on oneserver and SQL Server on another. Because both servers will have active users, you need anactive/active clustering solution. A 30-second delay is acceptable during failover. You have 30 daysto deliver.
Solution: Wolfpack isn't shipping yet, so you can go with either LifeKeeper orFirstWatch. Because you have no existing equipment, you can buy a SCSI-based solution (two serversand one disk subsystem) from a single vendor. One possible solution is Data General's NTCluster-in-a-Box, which comes to you with everything preconfigured from the manufacturer. (For areview of this solution, see "NT Cluster-in-a-Box.") If you can wait untilWolfpack ships, it will also solve your problem.
SITUATION 3
Credit Card Verification Service
Problem: You've decided to cash in on the electronic commerce craze and provide realtimeverification for credit card transactions on the Internet. Even a few seconds of failure couldresult in the loss of millions of dollars of transactions.
Solution: If you're brave enough to try this service on NT, your only solution today isfrom Marathon Technologies because it's the only solution that offers subsecond failover times andeliminates the need to restart user transactions. Its configuration duplicates both memory(redundant compute nodes) and disk (redundant data nodes).
Marathon Technologies' solution takes four off-the-shelf computers working together to create acluster. (For details about this solution, see the sidebar, "Marathon Technologies' Endurance4000.") You do not need to make any software changes.
SITUATION 4
Hot-Site Backup
Problem: As part of your disaster recovery plan, you want to maintain a hot site in caseyour primary site is destroyed. This plan requires the ability to mirror a server to a location 20miles from the primary site.
Solution: Most clustering solutions today assume that the cluster nodes are within twomiles of each other. Therefore, you need a solution that can provide mirroring across a WAN.Currently, only Octopus, NSI, and Vinca can provide this functionality. (For reviews of thesesolutions, see "Octopus SASO 2.0," "Double-Take 1.3 Beta,"and "Vinca StandbyServer for NT.")
SITUATION 5
Remote Application Access
Problem: You need to provide fault-tolerant remote access to your 500-member salesforce. They need 24*7 remote access to your company's applications.
Solution: A Citrix server will solve the remote application access problem. Cubixoffers a fault-tolerant solution for Citrix servers by providing load balancing and failover formultiple Citrix servers in a manageable communications cluster. (For a review of the Cubix solution,see "RemoteServ/IS.")
SITUATION 6
OS/2 Users Need Access to Lotus Notes 4.0
Problem: Your OS/2 client users need immediate access to Lotus Notes 4.0 for NT. LotusNotes is a critical application, so if users lose access for longer than 90 seconds, you're fired.
Solution: Vinca's StandbyServer for NT is one of the few solutions that support OS/2clients. IBM is one of Vinca's key distributors and provides OS/2 support. Purchase a new server torun DB2/NT, and use the old server as a standby server.
SITUATION 7
Schedule Upgrades to Your System
Problem: You would rather not spend all your nights and weekends upgrading your systems.
Solution: By putting your servers into a cluster group, you can manually fail over anode during working hours. Remember, the users are still working on the remaining node. Now you canapply a service pack, test it, and pray.
Once you are satisfied that the service pack changes are working, you can manually fail backthe node and the workload. Any NT clustering solution currently available will work in thisscenario.
SITUATION 8
Manually Load Balancing Your System
Problem: You have too many applications running on one server while another server isbarely used.
Solution: Ordinarily, you have to take down both servers, change their configuration,and restart. If the servers are part of an active/active cluster group, you can manually fail over asingle application without taking down an entire node. This approach effectively moves theapplication from one server to another.
You must make sure the solution supports application failover (as opposed to systemfailover). Application failover lets you fail over a single application without taking down theentire node, instead of failing over the entire system. For example, even though Octopus isactive/active, it supports only system failover today, which requires taking down the node. However,soon after you read this article, Octopus SASO 3.0 will be shipping, and it supportsapplication-level failover.
SITUATION 9
Two SQL Servers
Problem: You need high availability for users accessing two independent SQL Serverdatabases, each running on a separate server.
Solution: You need an active/active application clustering solution so that both nodescan be running SQL Server simultaneously. This requirement eliminates Wolfpack from your list ofchoices, because it can run only one instance of SQL Server per cluster. However, DigitalEquipment's Wolfpack clustering add-on pack and NCR's LifeKeeper let you run two copies of SQLServer in the same cluster, allowing each server to be the fallback for the other and thusincreasing availability.
SITUATION 10
Scaling ExchangeProblem: You want to scale Exchange to run faster and have high availability. You have adual Pentium Pro server.
Solution: Adding two CPUs to your server configuration would be nice, butunfortunately, Exchange scales effectively to only two CPUs (for more information about Exchange'sability to scale, see Joel Sloss, "Optimizing Exchange to Scale on NT," November 1996). Infact, the next release of Exchange (version 6.0) has been dubbed the "performance release"and will address this scalability problem. Wolfpack won't address scalability until phase 2, whichisn't due until 1998. So are we stuck?
Valence Research's Convoy Cluster claims to add availability and scalability for TCP/IPapplications and to provide load balancing among nodes in a cluster. This product is primarily aimedat intranet applications. Convoy Cluster was not available when we tested solutions for this issue.If this solution can scale, it will leapfrog Wolfpack by a year.
Future Trends
As these scenarios demonstrate, Wolfpack is not the appropriate solution in every case. Even so,Wolfpack is having a huge effect on hardware and software vendors.
When Wolfpack phase 2 starts shipping in 1998, developers can use the Wolfpack APIs to createapplications that will let cluster nodes work in parallel. The issue of scalability will start aheated debate among system vendors: Is a cluster of 4-way SMP systems better than 8-, 12-, and16-way SMP systems? If the answer is yes, NT will never have to scale beyond four CPUs in a singlesystem. As long as you can cluster 4-way systems and scale performance, NT will have a price andperformance unrivaled in the marketplace.
In the early adoption phase, companies will want to buy complete cluster-in-a-boxconfigurations, hoping to eliminate as many problems as possible. However, as clustering movesmainstream, users will demand the ability to mix and match components. Keeping up with NT's HCL ishard enough, and keeping up with the WHCL will be even harder. Octopus has been on the leading edgefor more than two years, by letting users mix and match components easily. Other vendors will needto do the same.
As more system vendors support Wolfpack, additional features will provide a competitiveadvantage. For example, Digital supports Wolfpack, but also offers a cluster add-on package thatlets both nodes of a cluster run SQL Server and gives existing users of Digital NT Cluster amigration wizard. Compaq, Tandem, and Dell will enhance their Wolfpack offerings by supportingServerNet, a high-speed interconnect. NCR supports Wolfpack, but also supports LifeKeeper, whichallows three-node clusters, compared with Wolfpack's two-node limitation.
Finally, look for other vendors to solve the scalability problem before Wolfpack. For example,Oracle Parallel Servers lets two or more Oracle database server nodes work on the same database,running queries in parallel on multiple nodes. Oracle will try to one-up Microsoft by shipping thislevel of scalability on NT before Microsoft can release the parallel version of SQL Server (version8.0).
Corrections to this Article:
In Mark Smith's article, "Clusters for Everyone," we incorrectly reported that Stratus uses a proprietary interconnect. In fact, Stratus uses standard, redundant 100Base-T connections. Though Isis Availability Manager (cluster software) runs on only Stratus, Stratus hardware can run multiple clustering software solutions, including Microsoft's Wolfpack clustering software. Finally, Stratus uses mirroring technology rather than SCSI-switching as was originally reported. For more information, visit the Stratus Web site at http://www.stratus.com.
About the Author
You May Also Like