Closing In on Clusters

The time is ripe for clustering to move into high-volume server markets such as the one Windows NT serves. This level-setting, comprehensive analysis clarifies the issues and reveals why companies with more than 50 employees will have NT clusters within

Mark Smith

July 31, 1996

26 Min Read
ITPro Today logo

Will Wolfpack dominate the high-volume Windows NT cluster market?

Let's say you decide to build your intranet using Microsoft SQL Server onone Intel-based server and Internet Information Server (IIS) on another. Supposethe IIS server dies and, within a few seconds--no user down time--all the IISusers and processing automatically switch to another server. What if Windows NTServer had that failover capability tightly integrated, and this capabilityrequired no proprietary hardware? Interested?

Or, suppose you convince your CEO that NT really can scale. You puta 100GB SQL Server database on your four-way Pentium Pro system to serve 5000users, and it runs out of gas. Where do you go from there? Do you need to lookat an eight-way parallel system, or can you just add another four-way PentiumPro server and have it work on the same database?

Imagine you can keep your system running while you're applying updates tothe operating system or key application software. For example, you can waituntil an off-peak time and move an application easily from one server to anotherand then apply a new version or service pack to the offline server and run sometests. When you're comfortable with the change, you can move the applicationback, testing as you go. If a problem occurs, you can easily move theapplication over again. All this while, users are up and running the veryapplication you're updating. Is this an administrator's dream?

Okay, now that you've seen a few scenarios, let me formally define clusters,the technology that allows such solutions. A cluster is a group of whole,standard computers working together as a unified computing resource that cancreate the illusion of being one machine--a single-system image. The term wholecomputer, which is synonymous with node, means a system that can runon its own, apart from the cluster.

Clusters have been addressing the problems of availability, performance, andscaleability for years. Until now, however, cluster vendors have focused onserving high-end customers, ignoring high-volume server markets. GregoryPfister, a senior technical staff member at IBM's server group in Austin, Texas,wrote In Search of Clusters: The Coming Battle in Lowly Parallel Computing.In this book, Pfister identifies three keys to making clusters a high-volumesolution: the speed of microprocessors, the availability of standard high-speedcommunications, and the existence of standard tools for distributed computing.

Pfister believes all three requirements, especially the speed ofmicroprocessors, are now met. The time is ripe to move clusters into thehigh-volume market, like the one NT serves. The current crop of NTmicroprocessors--Intel, Alpha, MIPS, and PowerPC--are as fast as the CPUs in thelargest computers. One example is Intel's new standard, high-volume (SHV)system, a four-way Pentium Pro-based motherboard that Intel developed to takeadvantage of NT's symmetrical multiprocessing (SMP) capabilities.

Intel is planning to create complete SHV systems for some OEMs, who willchange only the machine's faceplate. With Intel in the system business, you canexpect these high-end machines to roll out from more than 20 vendors within afew months, creating a ready-made SHV market for clusters.

Pfister predicts that vendors will greatly profit from developing ahigh-volume cluster market if they can take advantage of the timing and solve afew fundamental problems, such as creating a single-system image, forming astandard so that software vendors aren't locked into one hardware vendor'ssolutions, and pricing cluster software licenses so that the cost doesn't exceedthat of large parallel systems. Solving these problems could drive a tenfoldincrease in the quantity of cluster solutions shipped each year.

To appreciate the significance and implications of clusters in the NTworld, you can look at the work Microsoft and its partners are doing on NTclusters. You need to know about Microsoft's emerging Wolfpack cluster standard,get a perspective on what various cluster solutions can do, and understand whatvarious vendors are presenting to the NT market. For some background on the bookthat predicted the direction of clusters, see my review of In Search ofClusters and my interview with Pfister.

What Is Wolfpack?
Several leading NT Server systems vendors, including Compaq, DigitalEquipment, HP, NCR, and Tandem, have been independently working on clusteringsolutions for a few years. These vendors agreed to pool their expertise withMicrosoft in an initiative to produce a cross-vendor standard for NT Serverclusters. This group wanted to give NT Server customers the greater choice andflexibility they wanted. So in October 1995, Microsoft announced its intent todevelop strategic partnerships to fashion a new clustering standard with thecode name Wolfpack.

This name and many of its technology goals derive from Pfister's book. InChapter 4, Pfister describes a cluster as a "pack of dogs." Whilesearching for a code name for the API, Microsoft came across this book anddecided to describe clusters with the name Wolfpack, which sounds a lot coolerthan Dogpack.

Wolfpack is an alias for clusters, and the six core vendors in Microsoft'sclustering project consider themselves members of the Wolfpack. These membersare Digital, Compaq, Tandem, Intel, HP, and NCR. Each partner contributes keycomponents of its existing technology. Other vendors, including Amdahl, IBM,Octopus, Vinca, Marathon, Stratus, and Cheyenne, have agreed to support theWolfpack API. These vendors are part of Microsoft's Open Process, which includesabout 60 vendors and customers who are part of design previews during variousstages of Wolfpack development.

Wolfpack describes a set of cluster-aware APIs, NT cluster support, and aclustering solution (which means a vendor can claim to be Wolfpack compliantwhile competing with the Wolfpack solution on a different level--so if avendor claims to support Wolfpack, you need to ask how). Here's adetailed explanation of each Wolfpack component.

TABLE 1: Clustering Levels

Availability LevelRecovery TimeFailbackBoth nodes usedfor work

Standby

40 to 200 seconds

No

No

Active

15 to 90 seconds

Yes

Yes

Fault Tolerant

less than 1 second

Yes

No

Wolfpack: The API
You can make applications cluster aware by calling the Wolfpack API.The services the API accesses can speed recovery; let you take additionalactions, such as proactively notifying users on failover; let you restart andreacquire nonstandard resources; and let you monitor and detect more subtleapplication faults than a simple crash or lock-up. Potentially, Wolfpack APIservices will let applications achieve higher scaleability and do dynamicload-balancing on a cluster. Microsoft has not yet announced details of how theBackOffice applications will exploit the Wolfpack API services to become clusteraware.

Wolfpack: The Cluster Support
Cluster support will make all NT Server applications Wolfpack compliantin the sense that they will run exactly the same on a server that has Wolfpackas on a non-Wolfpack server, and Wolfpack will be able to do basic failoverrecovery of any NT Server application, without any need for you to modify theapplication. Wolfpack handles failover of an unmodified application by executingit through a provided wrapper dynamic link library (DLL) that notifies thecluster manager of the application's existence and creates a basic heartbeat (acheck-in on the other cluster machine and its answer, at regular intervals) soWolfpack can tell whether the application goes down or locks up. The pricing andpackaging of Wolfpack is not set, but I can imagine Microsoft adding clustersupport to NT in the same way that NT includes SMP support today.

Wolfpack: The Solution
Microsoft will deliver Wolfpack, the solution, in two phases. Phase 1 istwo-node availability and scaling clusters (a new version of SQL Server will letyou work on the same database from two servers at once). Phase 2 will allow morethan two nodes in a cluster.

Reread the first paragraph in this article. That scenario describes a June1996 demonstration of a Wolfpack availability cluster solution at PC Expo in NewYork City. This two-node failover capability is the basis for Phase 1 ofWolfpack (early 1997 is the estimate for delivery). The price for Wolfpack'sPhase 1 release is not set, but one rumor is that NT Server will includeWolfpack at no additional cost. As I write this article, Compaq, Digital, HP,NCR, Amdahl, Stratus, and Tandem have all announced plans to OEM theWolfpack-based cluster solution.

The next step in Phase 1 (set for the second quarter of 1997) will be anopen certification program with the goal of expanding the market for two-nodecluster solutions and giving NT Server customers a greater selection to choosefrom. Microsoft is also committed to making Wolfpack available on Intel, Alpha,PowerPC, and MIPS chips.

Reread the second paragraph in this article. It illustrates the need forscaling clusters; these clusters allow more than one node in a clusterto work on the same problem. This capability, application striping, isanalogous to RAID redundant arrays of inexpensive disks that work together onone set of data, performing data striping. Scaling clusters would handleperformance and scaling requirements of large applications and databases.

Phase 2, which will go into beta in 1998, will support clusters that havemore than two nodes. Increasing the number of nodes in clusters can providesignificantly more application scaleability and flexibility in availability thanis possible in a two-node cluster.

The objective in Phase 2 is to certify large clusters that have 16 or morenodes, each of which any NT Server machine can support. NT Server's architecturesupports up to a 32-way SMP machine.

Microsoft has not yet determined what additional capabilities, if any,Wolfpack will need in order to exploit large, Phase 2 clusters, but manyWolfpack members have solved such problems before and are the industry leadersin scaling cluster technology. A few of these vendors tell me that NT alreadyhas many necessary hooks to support clusters. This situation is no accident,because Dave Cutler, the architect of NT, was also the architect of Digital'sVAX, the first OS to deliver commercially available clusters.

Availability Clusters
The first phase of Wolfpack, availability clustering, has a wide range ofcapabilities, depending on the amount of up-time you need. For the sake ofdiscussion, I have grouped these solutions into three levels: standby, active,and fault tolerant. The amount of recovery time, whether a solution providesfailback, and whether you can use both nodes for work are the criteria thatdifferentiate the levels as currently available products implement them.

Table 1 illustrates how each level differs from the others. Each clusteringlevel offers a different type of solution, and each solution has implicationsabout the future of clustering on NT. Various vendors fit into each category,and understanding their solutions, their Wolfpack strategy, and their futuredirection is important for anyone considering clusters.

Standby Clusters
Figure 1 illustrates the cluster configuration for the standby category. Inthis configuration, the primary server does all the work and mirrors any data tothe other standby server in the cluster. Standby clusters require a full secondcopy of all data. The standby node checks the status of the primary node severaltimes a minute to make sure it's up and running. For most solutions, the standbynode sends a heartbeat--pings the primary server. If the primary server doesn'trespond, the standby server changes its status from standby to primary and takesover the application load from the failed primary server. This solutionautomatically switches users of NT, Windows 95, and Windows for Workgroups (WFW)3.11 to the new active server. Mac, OS/2, and Windows 3.1 users have to manuallylog on to the new system. When the standby server becomes the primary server,standby clustering logs off users who were logged on to the standby serverbefore failure.

The vendors that support standby clustering are Vinca, IBM, Octopus, andCompaq. Following is an overview of the products these vendors offer.

Vinca: Ray Noorda, now of NFT Ventures and former CEO of Novell, ischairman of the board of Vinca. This com-pany's product, StandbyServer,is configured as a disk controller device driver and appears as another diskdrive. Vinca uses NT's native disk mirroring technology to keep the primary andstandby servers in synch. By relying on NT's native facilities, Vinca achieves ahigh degree of application compatibility. Whereas other systems check forhardware failure (the heartbeat method) alone, Vinca can also check for softwarefailures. You can configure StandbyServer to monitor application and OSprocesses down to the thread level.

Vinca is committed to open architecture. StandbyServer works withany Intel-based servers and any SCSI controller. The servers do not have to beidentical. A dedicated high-speed link connects the primary and standby servers.This link can be an Intel EtherExpress PCI 100Mbit (Mb) or Vinca's EISA 100Mbcard. Other drivers are in development. StandbyServer works on NetWare,OS/2, and NT.

To get a glimpse of where StandbyServer for NT is headed, you justneed to look at StandbyServer for NetWare, which has enhanced featuresover Vinca's NT product. The NetWare version supports fiber technology fordirect connection. In addition, you can use any node on the network as thestandby server. SnapshotServer is an add-on product that facilitatesbacking up live files without negatively affecting network performance. Becausethe standby server has a live copy of files at all times, you can back up fromthe standby machine without affecting the network. This capability does notreplace backup, but enhances regular backup software. Vinca plans to support anactive-active availability cluster in the near future.

StandbyServer for OS/2 is the only availability solutionfor OS/2, which is why IBM fully endorses and distributes StandbyServerfor its customers who need availability clusters today. StandbyServerfor NT includes the software and a 25-foot dedicated link, but not the hardwareinterface.

IBM: IBM recently announced the IBM PC Server High AvailabilitySolution, which bundles Vinca's StandbyServer with IBM's PC Serverhardware. IBM will distribute this product through its business partners tocustomers who need an NT cluster solution today.

IBM will bring more than 20 years of experience in high-end clustersolutions when it moves to the NT market. One likely approach is to port the IBMScalable POWERparallel (SP) cluster solution, which offers scaleability andperformance, to NT. The objective is a cluster solution that runs onindustry-standard hardware and fully supports the Wolfpack APIs. Also, IBM plansto make its Software Servers suite cluster aware. The first two products will beNotes Cluster and DB2 Cluster. The rest of the suite will follow in 1997.

Octopus: Octopus Technologies is shipping Octopus Automatic SwitchOver (ASO) for NT. This unique solution can mirror data anywhere on a LAN orWAN. Other products require the clustered nodes to be near each other. The ASOfeature already allows N-way failover conditions: You can connect more than twonodes to a designated server, which can assume the work of any connected node.Another unique feature of Octopus is its ability to mirror files on any hardwarethat supports NT. Octopus (unlike Digital's solution) can create a failovercluster between an Alpha and an Intel server.

Octopus inserts itself into NT's file system and has a proprietaryreplication engine. One benefit is the speed of replication. Unlike otherreplication technologies that mirror at the file or disk level, Octopusreplicates changes only. In slow-speed connections, however, you can lose datathat isn't committed at the time of failure.

Compaq: Compaq offers a Recovery Server Option Kit that you canconfigure in two modes, Standby Recovery Server and On-Line Recovery Server. Thekit includes software, cables, and a switch for a two-node configuration. Inaddition to the kit, you need any two Compaq servers, Compaq's external storagebox, Compaq's SCSI cards, and a Compaq hardware interconnect card. Compaqimplements failover primarily through proprietary hardware, rather thansoftware.

The Standby Recovery Server option requires a manual element in itsfailover process. After a failover, users must log in to the standby servermanually. The On-Line Recovery Server is an active availability clusteringsolution.

Active Clusters
Figure 2 shows the second type of availability cluster, active availability.Whereas standby clusters require a full second copy of all data, active clustersdon't. In this configuration, both servers are primary and doing meaningfulwork, as the scenario in the first paragraph of this article illustrated. Whenone node fails, the users and the applications fail over to the available nodein the cluster. The users experience a delay in their processing and, in somecases, can lose data that was not saved. All users and applications on theavailable node continue to work unchanged, although both sets of users are nowrunning on one server instead of two. This combination of users will slow bothsets of users. Manual load balancing can minimize the impact of a failovercondition. Another feature is automatic failback, which lets the processing andusers return to the failed node once it has recovered.

The vendors that have active availability cluster solutions are Microsoft,Digital, Compaq, Tandem (Unix only), NCR, HP (Unix only), and Amdahl. Here's abrief summary of each of these offerings.

Microsoft: You've already read an overview of Wolfpack, so let'slook at two prerelease screen shots to get the flavor of how Wolfpack clustersare configured. Screens 1 and 2 are from the Wolfpack version that Microsoftdemonstrated at PC Expo.

Screen 1 is a view of the cluster administrator's console. This view showshow Wolfpack lets you manage an application and all its related resources as onegroup. Here, the SQL Server group includes the SQL database, a diskdrive, and an IP address. With one mouse click, the administrator can move thisentire group to another machine in the cluster. This capability makesload-balancing or taking a server off line for routine maintenance easy to dowithout bringing down important business applications.

Screen 2 shows Wolfpack's Resource Dependencies window. Traditionally, oneof the tough administrative jobs with clusters has been figuring out how toprioritize all the various applications and resources so that they fail over andrestart in the right order. With Wolfpack, the administrator uses thispoint-and-click window to establish the dependencies for each resource in anapplication group. Wolfpack then automatically figures out the correct restartpriority for all the resources that a server or application failure affects.

Digital: If you're looking for a solution that is probably close tothe Wolfpack solution, check out Digital Clusters for Windows NT. Digitallaunched its product before Wolfpack's release because of NT market demand forclusters. Once Wolfpack ships, Digital will provide a migration wizard to helpDigital's NT cluster customers move to Wolfpack. If any functionality thatexists in Digital's product doesn't make it into Wolfpack, that functionalitywill be available as a low-cost add-on called the NT Cluster Plus Pack.

Digital's solution supports failover of the NTFS file system, Microsoft SQLServer 6.5, and Oracle7 Workgroup Server 7.1 and 7.2, and scripting allowsgeneric application failover. Digital also supports failover between twoIntel-based servers and between two Alpha-based servers, but not between anAlpha and an Intel server. According to Digital, the problem is with NT, andMicrosoft needs to address it: The page log size is different on RISC systems(such as Alpha) and Intel. At press time, Microsoft had no plans to remedy thissituation.

Compaq: Compaq's On-Line Recovery Server meets the active clustercriteria except that it does not provide automatic failback. That capabilityautomatically reroutes applications and users to their primary server if thefailed node is recovered.

Compaq plans to upgrade its products with Wolfpack-compliant products whenthey become available. In addition, Compaq will migrate its SCSI switchingtechnology to Tandem's ServerNet Interconnect Technology. Current customerspotentially face a two-step migration--Wolfpack compliance and ServerNetimplementation.

Compaq will make sure that all its hardware can participate in allWolfpack-compliant configurations. "We want to be absolutely compatible,but also differentiate our products from other solutions," said Tim Golden,Compaq's cluster manager. "One way we will differentiate ourselves isthrough our alliance with Tandem, which includes their ServerNet technology. Itdelivers redundancy at all component levels. ServerNet has higher availability,scaleability, and throughput than other cluster interconnect devices we've seen,"said Golden.

Tandem: Tandem has no NT cluster solution but will fully supportthe Wolfpack solution when it becomes available. Tandem plans to supportWolfpack on the low end and provide its Himalaya servers for situations thatcall for scaling beyond the limits of its Intel SMP-based systems.

On May 7, 1996, Tandem joined the Wolfpack core team by announcing thatMicrosoft had funded an effort to port Tandem's high-end availability productsto NT. Tandem has built its reputation by providing high-end availability andscaleability servers for the last 20 years. The company sees NT as its ticket tomove beyond the high-end market into the high-volume market. In fact, during theannouncement, Tandem declared that NT really means New Tandem.

The Tandem/Microsoft alliance has several key points. First, Microsoft willfund ($30 million) the port of Tandem's NonStop ServerWare Solutions to NTServer. These solutions include Tandem's parallel, scaleable SQL database,Tandem's clustered transaction-processing environment supporting the TUXEDO andCICS transactional APIs, and Tandem's distributed messaging and objectmanagement environment. This technology will let NT Server users take advantageof Tandem's Independent Software Vendor (ISV) portfolio of more than 1000business-critical solutions, including online transaction processing, electroniccommerce, Internet/World Wide Web, data warehousing and decision support, onlineanalytical processing, and other business-critical solutions for the finance,telecommunications, retail, healthcare, and transport markets.

In addition, Tandem will port its ServerNet technology to NT this fall.Developed for Tandem's large Himalya machines, ServerNet allows very high-speedcommunications between nodes in a cluster. ServerNet potentially providescapabilities that current open definitions do not. For example, if all the I/Odevices connected to the cluster are on ServerNet, you can fail over not justdisks, but printers, tape drives, and any other I/O device. As Wolfpack getscloser to delivering large scaling clusters, this high-speed I/O will be veryimportant. The ServerNet drivers will ship with Wolfpack, and Compaq and Tandemand its partners will sell the complete solution.

NCR: Wolfpack from NCR will be a part of an overallhigh-availability story that includes LifeKeeper for Windows NT. LifeKeeperoffers many features of Wolfpack Phase 1, including automatic failback andautomatic reconnection for all client types. NCR will position LifeKeeper as avalue-add clustering product with support for the Wolfpack APIs. This supportwill let LifeKeeper run all Wolfpack-compliant applications on Intel-basedservers.

Although NCR will sell the Wolfpack solution, this company is alsocommitted to keeping LifeKeeper one step ahead of Wolfpack. For example, thecompany plans to introduce a three-node cluster for LifeKeeper by the firstquarter of 1997. In this configuration, all three nodes are active and can failover to each other. In addition, a future release of LifeKeeper will supportOracle7 Parallel Server. Oracle has not announced support for Parallel Server onWolfpack, the solution.

LifeKeeper for NT includes three recovery kits, one each for TCP/IP,NetBEUI, and SQL Server. In addition, recovery kits for Oracle, Lotus Notes,Sybase, and Exchange are available.

HP: HP's clustering roadmap includes its MC/Service Guard,Wolfpack, and Oracle Parallel Server. HP will port MC/Service Guard, nowavailable on HP 9000 Unix, to HP NetServer application servers running NTServer. This approach will give MC/ServiceGuard customers an NT cluster solutionthat will not require learning a new paradigm.

Amdahl: A company long associated with high-end computing, Amdahlis offering an active cluster that scales to eight EnVista servers, which arebased on Intel SHV systems. This solution is already beyond the two-node clusterof Wolfpack Phase 1.

The key to this level of scaleability is the EnVista Availability Manager,which is really the Isis Availability Manager licensed from Stratus. The IsisAvailability Manager runs on each node, and a majority voting mechanism, not aheartbeat, determines when a node has failed. Once the cluster participants voteout a failed node, one of the remaining nodes picks up the load, according torules-based logic in the cluster configuration. Isis can recognize hardware,software, and performance failure and already provides N-node failover.

The node interconnect uses a switched, full-duplex, 100Mb Ethernet. Byfirst quarter 1997, Amdahl will offer an interconnect rated at 40MB per second(MBps), probably from Fujitsu. For disk access, Amdahl uses its LVS 4500 storagesolution instead of shared SCSI, providing data availability with dual-portednode failover capability.

Amdahl views compatibility with industry standards as critical. OnceWolfpack becomes available, Amdahl will add support for it and will recommend itas the preferred cluster technology for new customers.

Stratus: Stratus provides N-node availability in a pre-assembledconfiguration called a RADIO Cluster. A RADIO Cluster has six nodes: twocomputer, two storage, and two network nodes. Every component is redundant.

Unfortunately, Stratus calls every component a node, so figuring out howthis system fits together took me awhile. Once I got over that hurdle, I wasamazed at the engineering that went into these units. The compute module, forexample, has a two-way Pentium processor, a 1GB IDE drive for booting thesystem, and 100 Base T redundant hubs. The storage modules will support up tofour 2GB PCI Fast and Wide SCSI-2 disk drives that you can custom-partition tosupport various access and recovery schemes. The redundant network modules arehigh-speed, inter-networking hubs that interconnect compute modules and storagemodules and route all messages and data necessary for application execution inthe RADIO cluster. RADIO requires 1" high drives in the storage nodes.

Stratus owns the Isis Availability Manager, which is loaded with IsisActive Replication Technology in each node. This approach allows all the activeavailability features in a clustered environment with no single point ofhardware, software, or network failure. Stratus plans to support the WolfpackAPI, so any Wolfpack-compliant applications will be able to run on this system.Stratus has a lot to offer besides supporting the Wolfpack API: up to 24 computeand storage nodes can be in one cluster, all nodes are hot-swappable, zerodowntime for NT database applications is available through optional Isis forDatabase software.

Fault-Tolerant Clusters
On the high end of availability clusters is the third level, completeredundancy by means of fault-tolerant clusters. As Figure 3 shows, very part ofthe cluster is active and redundant with another component. Failover times arewithin one second. The goal is 99.999% up-time, or about six minutes of downtimeper year. This capability is characteristic of the solutions from Marathon.

Marathon: Marathon provides fault tolerance with off-the-shelfcomponents. MIAL 1, Marathon's first product, focuses on realtime dataprotection. A basic configuration requires three computers: a compute server andtwo data servers. The network cards, CPU, and disks are completely redundant.This system writes information from the compute server to both data serverssimultaneously. If one data server fails, the users and processing will continueon the main server.

MIAL 1 uses a proprietary interconnect between the two data servers. Thiscard offers full-duplexed, hardware-assisted data integrity checks and 32MBpsthroughput. If one data server fails, the system cluster will remainoperational. When the failed server is repaired, the available data server willautomatically resynchronize the recovered server by replicating the entire diskstorage to the recovered server. This replication happens in the background atthe rate of about five to 10 minutes per GB of storage. MIAL 1 replicates theentire disk and assumes everything on the recovered server was bad.

MIAL 1 has one point of failure: If the compute server fails, the clusterwill go down. Marathon says it will fix this failure point in the next version,MIAL 2.

Marathon's solution does not require any application to be cluster aware,nor does it require recovery scripts. Marathon's position is that this ease ofuse lets administrators easily implement fault-tolerant systems. The companywants to emphasize total cost of ownership--Marathon believes its solutions letyou keep up with the power curve in the industry, the latest CPUs. You need noAPIs, no scripts, no special applications, no special version of NT. So why doesMarathon need Wolfpack? The company wants to let its cluster solutionparticipate as a node in a Wolfpack scaling cluster.

MIAL 1 includes the three proprietary Marathon Interface Cards (MICs),software, and a SplitSite Data Link, which lets you configure the system andassists in system management. With this interconnect device, you can plug infiber optic drivers or copper cable and connect between buildings. You can alsoconfigure this device to activate an alarm if a component fails.

Making Clusters Commonplace
Windows NT has taken a lot of criticism for not scaling or being as faulttolerant as large systems. Clusters let Microsoft address these concerns in away that fits with its high-volume channel strategy. The traditional enterprisevendors are handing over some of their most prized solutions to participate inthis next wave of enterprise computing. Once again, NT is the bridge between thehigh-end and high-volume solutions market.

The high-volume cluster market is only beginning. Will clusters of four-waySMP systems have better price/performance than 8-, 12-, and 16-way SMP systems?Can Microsoft encourage ISVs to write Wolfpack cluster-aware applications thatprovide fault tolerance and scaleability? If the answers are yes, Microsoft isin a good position to make clusters commonplace.

Wolfpack-Compliant Applications
The flavor of each cluster solution is interesting, but the success of thestandard is critical. As with its other standards, Microsoft will declare theWolfpack API a standard when more Wolfpack-compliant applications are shippingthan the sum of all solutions based on competing technologies--when Wolfpack hasmore than 50% market share. During design previews, more than 20 softwarevendors resolved to deliver Wolfpack-compliant applications by March 1997. Theseapplications will be cluster aware and will provide capabilities, includingscaleability, beyond basic failover. I expect Microsoft to have SQL Servercluster aware by the same time and the rest of the BackOffice suite by the endof 1997.

As Wolfpack-compliant applications become available, Microsoft willprobably develop a new logo, something like, "Windows NT Cluster Enabled,"to show that a solution is cluster aware. What does cluster awareness buy you?In the event of a failure, a cluster-aware application can restart each userright where he or she left off. In a cluster-aware database, a cluster-awareapplication can start the database server, log in the user to the database, andrestart an existing query. A non cluster-aware application can return a messagesuch as, "drive not available," and you have to manually return to theprevious state.

Microsoft has just released the preliminary Wolfpack API set to the OpenProcess participants under nondisclosure agreements, so vendors are only nowbeginning to develop Wolfpack-compliant applications. The theory is that suchapplications will run on any Wolfpack-compliant cluster solution. Unfortunately,the cluster solutions are very different, so making this goal a reality ischallenging.

To get a vendor's perspective on this challenge, I spoke with Cheyenne, acompany that is working on a Wolfpack-compliant add-on to its backup solution,ARCserve 6.0.

Cheyenne
Cheyenne views Wolfpack as a way to satisfy increasing demand fromenterprise-level customers, who want Cheyenne to support clusters. Cheyennealready supports availability features such as RAID and recovery, so theaddition of support for availability clusters is a logical next step. InARCserve's RAID 5 implementation, three or more tape devices together canperform one backup. Screen 3 illustrates Cheyenne's approach. If one tape devicefails, the backup continues without interruption. In addition, you can put eachtape device on a separate SCSI bus to provide bus fault tolerance. Finally, youcan restart tape backups after a failure.

Cheyenne believes it can support standby and fault-tolerance clusters todayby attaching the tape devices to the mirrored server. The Wolfpack solution(active availability) is much more challenging than that approach. First,Cheyenne needs to determine whether including the tape devices on the same SCSIbus as the disk drives is possible. That way, both nodes in the cluster canshare the tape device. If this method is possible, Cheyenne needs a way toswitch from one node to the other. If the tape devices are attached to bothnodes, you have some interesting tape-management problems to solve.

The challenges to a Wolfpack-style solution are not trivial. In the past,such challenges meant a vendor had to create a different version for eachclustering solution it supported. This necessity caused application vendors tosupport only the cluster solutions with the highest market share. If applicationvendors such as Cheyenne can tackle these problems by implementing the WolfpackAPI, we will see many cluster-aware applications in the next 18 months.

Contact Info

Cheyenne * 516-465-4000Web: http://www.cheyenne.comHP * 301-670-4300 or 800-752-0900Web: http://www.hp.comIBM * 520-574-4600 or 800-426-3333Web: http://www.ibm.comIntelWeb: http://www.intel.comTandem * 408-285 6000 or 800-538-3107Web: http://www.tandem.comEmail: [email protected]

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like