Optimizing Exchange to Scale on NTOptimizing Exchange to Scale on NT

The Windows NT Magazine Lab tests Microsoft's Exchange Server's ability to scale on an NT platform based on CPU and memory use. The conclusions might surprise you.

Joel Sloss

October 31, 1996

26 Min Read

Does NT Scale?

NT scaleability is one of the biggest questionsin the PC industry today. As PC business desktops and PC servers move intospaces formerly reserved for large-scale mainframes and UNIX boxes, answeringwhether or not NT can fill those extremely large shoes gains paramountimportance.

Not only does Microsoft need to know if itsbrainchild fits the bill for enterprise deployment, but the customers need toknow. In this issue of Windows NT Magazine you will find the firstanswers to the big question: Does NT scale?

Our initial data says: Yes, NT scales. Whatdoes this data mean? It means that as you add resources to an NT client/serversystem, such as CPUs, memory, faster network components, and more disk space,system performance increases. It means that as you add users and applicationload to your systems, the operating system doesn't choke.

Yes, your system configuration should matchyour user load. Now, you might say, "You just said that I can only getbetter performance if I throw a bigger-more expensive-machine at my users!"But, isn't that the definition of scaleability? If you need better performance,and need to support more users or heavier user loads, that you can build abigger system and the OS will handle it?

NT is on the road to big things. It isn't allthe way there yet-our tests show that it doesn't scale completely linearly, andit is bounded (response times do increase with user load, and you can, to apoint, combat this by adding system resources)-but NT is moving up fast. Soclimb on board for the first big ride of the 21^st century as we lookat the operating system and the machines that will make it all happen.

ARTICLE
The burning question of the day is, "DoesNT scale?" This is one of the most difficult questions to answer, becauseit calls a huge variety of issues into play, not the least of which is thenature of client/server computing itself.

Windows NT Magazine is going to answerthese questions for you in a series of articles over the next six months (atleast), in which we will review client/server issues such as networking, diskperformance, system configuration, application configuration, user load,end-to-end performance testing, and more. We will fold in reviews of serversfrom companies such as Compaq, HP, IBM, NEC, Tricord, and many others to bringyou performance data about server hardware scaleability, upgradeability, overallperformance, and even clustering. Along the way we will discuss the tests weuse, the metrics we record, our goals, and our findings, and then relate thisinformation in a real-world fashion to what you can expect from the same boxesin your environment. You can then use this information to aid in your buyingprocess, to tune your existing setups, or to make fundamental decisions aboutmigrating to NT from other operating systems and what applications to use inyour enterprise. The obvious answer is to buy a system that matches your needs,but when you need to scale and grow, it is a far more complicated issue to knowwhat to buy.

Each of the above issues in client/servercomputing contains a myriad of others, each of which can affect your server'sperformance, network throughput, and user "happiness." Networkinginvolves the wire you use, the NICs installed in the client systems and theservers, the protocols you run, the configuration you lay out (domains,workgroups, multiple segments, connection hardware, etc.), and the I/Ocapabilities of the system/NIC relationship. Disk performance covers how manyand what kind of drives you use, what your data set size is, what the disktransaction mix is (reads vs. writes, random vs. sequential), what controllersare installed and how many there are, disk caching, RAID configurations, andsystem/disk subsystem relationships. The system configuration includes thingssuch as number of CPUs and amount of memory, PCI and system bus architectures,amounts and types of CPU cache (Level 1, Level 2, and Level 3), types ofcomponents used, and so forth-plus, you need to find a happy medium for yoursituation, because what is optimal for one application is not necessarilyapplicable to another, even on the same physical box. Application configurationencompasses everything from SQL Server optimization to application serving towhat and how many applications can be run from the same system. User load mayseem obvious-how many users can be supported on a server-but it also involveswhat the users are doing, how the client systems are configured, and what typeof network they are on. The last piece we'll look at-end-to-end performancetesting-is perhaps the most complicated of them all. It calls into questioneverything I've listed here, and adds complexities such as, What is a real-worldtest? What numbers do people care about? As well as reality checks on usersimulation, system/network configurations, transaction mixes, and much, muchmore.

Client/server computing is the mostcomplicated environment ever to exist in the PC world. You can't look at justone component in your enterprise architecture and assume that optimizing it willimprove everything. Not only do you have to remember that your system is only asfast as its slowest component, but also that the highly complex nature of thisnew paradigm called client/server computing means there are emergent propertiescausing problems in places you may not even be aware of. In other words, it isalmost completely unpredictable-knowing the input to your system doesn'tnecessarily mean that you'll know the output.

Exchange
For our first round, we decided to useMicrosoft Exchange 4.0 for testing systems as messaging platforms. We also tookthe opportunity to test Exchange itself, and see how well it scaled with varyinglevels of CPU and memory configurations-we'll look at disk, network, and otherfactors at a later date. We wanted to answer the question of resourceutilization and whether or not throwing more at it made it perform better. Theresults were pretty startling, as you'll see in a bit.

Windows NT 4.0 claims to offer scaleabilityenhancements over NT 3.51 in overall performance and multiprocessing, andExchange purports to scale up to 10,000 users on a single server system. Whilewe have no question that-for the most part-these claims are true, what price doyou have to pay for them? Yes, Exchange can support 10,000 users, but if ittakes you ten minutes to download your mail and connections are lost left andright, it may not be worth the effort.

To find this out, we used LoadSim, a tool thatMicrosoft provides with Exchange Server, for testing system response times undervarying user loads (see the sidebar, "LoadSimRevealed"). It is an end-to-end testing tool, meaning that itmeasures total system response time, from the moment the user request leaves theclient to when the result returns (the simple explanation in this case is howlong it takes for an email message to be sent and an acknowledgment to come backfrom the server, allowing you to move to your next task).

We did our first run of tests on a TricordSystems PowerFrame (see the review, "Tricord: A Mainframe's Little Sibling"on page 55 of the magazine) with four 166MHz Pentiums (2MB of Level 2 cache foreach), 1GB of RAM, and seventeen 2.1GB Seagate fast and wide differential SCSI-2drives. Our intent was to remove as many bottlenecks as possible so that we weretesting CPU and memory performance, rather than choking on other factors. Ourclient testing environment consisted of fifteen 60MHz Pentiums with 32MB of RAMon each, running NT 3.51 Server, all connected to the PowerFrame server via two100 Base TX Ethernet segments (with 100Mbit going to each client, and two NICsin the server). Each physical client system simulated 100 users, giving us atotal of 1500 users.

We set up the environment as a domain, sinceExchange is a domain-oriented enterprise messaging platform and not a workgroupemail system (although, Exchange functions just fine in small workgroups-theworkgroups just have to be set up as domains). The target server (the Tricord)acted as the Primary Domain Controller (PDC), as well as the Exchange server andperformance monitoring station. At the same time, there was an auxiliaryworkstation attached to the server's management card passively recordinghardware-level data on CPU, bus, memory, and disk utilization.

In the Thick of It
I'm going to delve into a number of aspects ofthis test, laying the groundwork for future articles that you'll see in themonths to come. There will be brief discussions of each aspect as it relates toour findings on the Tricord box, along with more general thoughts aboutbenchmarking strategies, errors, etc.

We watched many different system/performancefactors while running the LoadSim tests. These factors included CPU utilization,memory utilization, and disk performance, along with the metrics collected byLoadSim itself: average response time and transactions per day. With this data,we can analyze a number of performance characteristics of Exchange on NT, aswell as the performance of the Tricord hardware platform.

The Environment
The first step in testing any platform isdetermining a hardware/software configuration that will best bring out thecharacteristics that you are trying to measure and that simulates a real-worldenvironment (system price, amount of resources, etc.). At the same time, youdon't want to introduce new problems by setting the system up improperly to runthe application/OS you are testing-you need to know where the bottlenecks are.

To test scaleability of an Exchange/NTplatform, we decided to look at CPU and memory. This decision required removingall other system aspects from the equation by throwing as many resources at thetest as possible, such as disks and network bandwidth. This setup accomplished anumber of things.

First, we set the system up with enough diskspace so that the memory to disk ratio would more closely emulate a real clientenvironment; remove the disk I/O bottleneck from the system; and be optimal forthe application under test. In a client environment, an ideal setup is to havefar more disk than memory, so that the slower disk access can be spread out overmultiple controllers and multiple drives to keep up with fast memory accesses.This setup also has the effect of mimicking IS shops where large data sets arein use.

Also, most database applications eat up asmuch memory and disk as you can throw at them-since Exchange is based on aclientserver database architecture (although it is optimized for email andsemistructured groupware data storage, and not for structured transactionalprocesses like SQL), we assumed its behavior would be fairly similar to anapplication like SQL Server and run well in a database-optimized environment.While it is true that the maximum configuration we used was pricey (more than$100,000), we decided that in this first go-around, we would test Exchangeitself, and remove the hardware factors as much as we could.

To that end, the Tricord system was configuredwith 17 drives on a single multichannel fast and wide differential SCSI-2 RAIDcontroller, as seen in Figure 1. This controller has four independent buses,capable of supporting up to 60 devices total. We set up a single OS volume onone drive with just NT 4.0 Server and the Exchange executables on it, whichresided on bus 1. Since the system was running with 1GB of RAM, NT needed alarge pagefile; also, NT operates best if the pagefile is split across multiplephysical drives, so we put it by itself on a RAID 10 volume made of four drives,two on bus 1 and two on bus 2 (a RAID 10 volume is a mirrored set of RAID 0disks-see the sidebar, "RAID Performance and NT"for a discussion of why we chose this). The logfiles we were creating from bothLoadSim and NT Performance Monitor (Perfmon) were recorded to another RAID 10volume, also on four drives and split between buses 1 and 2. All of Exchange'sdata files were stored on a completely separate RAID 10 data volume composed ofeight drives, with four on bus 3 and four on bus 4. Splitting the disk accessesacross more than one SCSI channel or bus further improves performance.

Next, we didn't want network I/O to be asignificant limiting factor on our testbed. Ideally, 1Gbit fiber optic linksbetween client workstations and the server would give the best possibleperformance, but it isn't terribly realistic. A reasonable compromise was to usetwo full-duplex 100 Base TX (100Mbit) Ethernet segments, with eight clientsystems on one, and seven on the other. Each network segment had a 12-port 3ComLinkBuilder 100 Base TX hub (a 3C250-TX/1). To maximize server throughput, bothsegments ran through a 100TX switch to minimize network collisions and providethe fastest (and fattest) possible data pipe from the clients to the server.

On your network you might only have a single10Mbit NIC in your server, and in future articles we will be covering theimpacts of changing your network configuration (upping the data transfer speedwith faster cards, adding NICs, changing network layout, tweaking protocolsettings on the server, etc.) on overall system performance. However, it wasimportant for us to know that any changes in measured response times were notdue to network timeouts and lost packets while the server chewed on userauthentication and other operations. A machine this big has more than enoughprocessing power to handle a vast number of users, but it will only workproperly if the server is not wasting time waiting on the network-this is theonly way to test true server throughput. We still experienced network problemsat the server (see the sidebar, "LoadSim Revealed",and "Final Analysis" later inthis article) which could have been due to the LoadSim tool itself, Exchange, orany number of factors-we were apparently unable to eliminate all networkdependencies.

Ideally, you want to simulate only one userper physical client system, but that gets a trifle unwieldy when you areemulating 1500 users. This is another reason for using a 100Mbit instead of10Mbit network: when a single system is running operations from 100 simulatedusers, the network traffic on that system is obviously far higher than if onlyone normal person is using it. We needed to provide enough bandwidth so thatthis was not a problem.

The Client
Client dependencies in LoadSim are actuallypretty significant, so our testing strategy required the use of some fairlyheavy duty systems. A fat client can simulate a lot of users withoutbottlenecking; a thin client couldn't do as many. Also, we didn't want toencounter the same bottlenecks at the client side that we were measuring at theserver, so we needed a combination of system and user load that behaved aslinearly as possible. In other words (and this is somewhat oversimplified), wecouldn't simulate only one user per system, but to simulate 100 real-world userson the same box, the client system needs to be 100 times as powerful as a normalend user would have. This way, it can run more than 100 user operations at atime, and still be just as fast as if it were only running one (within a certaintolerance, of course). Remember, too, that most average users are not fullytaxing their computers, so there is a lot of unused headroom. Your users mayhave 200MHz Pentiums on their desks, but they may only be using a tenth (orless) of their total throughput capabilities.

To accomplish this test, we used 15 Tricordclone servers (the PowerFrame DS1000, built on an Intel Extended Expressmotherboard) as the client/user simulation machines, each with a 60MHz Pentium,32MB of parity RAM, an Adaptec 2940 SCSI controller, a 425MB (or 1GB) Fujitsudrive, a 3Com EtherLink III (3C597) EISA 100TX NIC, and integrated video. Eachsystem was running NT 3.51 Server (with Service Pack 4).

After some capacity testing against themaximum server configuration (four CPUs, 1GB of RAM) with a single client, wesettled on 100 simulated users per physical system. At this load, response timesleveled off to an acceptable value on an unloaded server (about one second onaverage), and still represented a good target for actual users. At the sametime, we decided that 100 users per system was a good break point for thesimulation, without going beyond what is a reasonable representation of the realworld.

At 100 users per physical client system, CPUutilization hovered around 30%, with occasional spikes to 100%. This means thatLoadSim is far more memory constrained than CPU constrained-response timesdidn't become unreasonable until higher user counts were measured. Even then,the CPU was still not fully utilized, while memory usage was peaked.

Server Optimization Factors
We tested the Tricord in a number of differentCPU and memory configurations as seen in Table 1A, to find how Exchange/NTscaled with resource allocation, and where Exchange breaks down and how it isoptimized (at a set load of 1500 medium usage users).

Table 1A: Tricord Server and Exchange Configurations

# CPUs	Total System RAM	IS Buffers	Directory Buffers	Available RAM
1, 2, 4, 8	128MB	8612	2289	10MB
1, 2, 4, 8	256MB	23796	6325	90MB
1, 2, 4, 8	512MB	58403	10000	200MB
1, 2, 4, 8	1024MB	134967	10000	600MB

Exchange Server 4.0 seems to be more memory constrained than itis CPU constrained—it scales fine with additional CPU resources (since itis a multithreaded SMP-aware app), but memory utilization is not always optimal.

The Tricord box was optimized for networkapplications (which you set with the Control Panel/Network applet under theServices/Server tab) and for background applications (by turning down foregroundoptimization in the Control Panel/System applet, under the Performance tab) inorder to maximize the server's throughput-at least, as far as NT was concerned.

Each time we changed the configuration of theserver (varying the number of CPUs or memory using the /NUMPROCS=XX or/MAXMEM=XX switches in the BOOT.INI file, rather than physically adding orremoving hardware), we reoptimized the system with the Exchange Optimizer toolthat ships with Exchange Server (see Table 1A). You tell the optimizer what yourexpected user load is, and it analyzes your hardware configuration to find thebest combination of memory, disk, and CPU usage, and adjusts software settingsto match. As we changed available memory, the optimizer changed how much memoryit used by increasing or reducing the number of Information Store buffers (itstill left considerable memory unused-as indicated by the Perfmon memorycounters; see Table 1B-even when there was much to spare. Exchange isobviously designed to run as a background application on a server that is doingother things at the same time, so Exchange doesn't hog all of the availableresources). On the plus side, this meant that Exchange was operating totallywithin available memory, so paging (swapping) activity was negligible. Changingthe number of CPUs didn't affect any of the Exchange settings.

Table 1B: Exchange Server Configuration

Microsoft Exchange Server information store log files	Logfile drive
Microsoft Exchange Server Private information store file	Data drive
Microsoft Exchange Server Public information store file	Data drive
Microsoft Exchange Server Directory service database file	Data drive
Microsoft Exchange Server Directory service log files	Logfile drive
Microsoft Exchange Server Message Transfer Agent	System drive
Microsoft Exchange Server Internet Mail Connector Files	System drive
Minimum # of information store threads	10
Maximum # of information store threads	100
# of directory threads	50
Maximum # of cached categorizations	200
Maximum # of cached restrictions	200
# of private information store send threads	2
# of public information store send threads	2
# of information store gateway in threads	2
# of information store gateway out threads	2
# of information store users	1000
# of XAPI MT threads	2
# of XAPI MT queue threads	2
# of dispatcher threads	2
# of transfer threads	2
# of kernel threads	3
# of database data buffers per object	6
# of RTS threads	3

Disk configuration was a major issue in thistest, because the Tricord server's disk subsystem is almost as fast as itsmemory. The four-bus fast and wide SCSI-2 RAID controller (fast and widedelivers a data transfer rate of about 20MB per second) with 17 drives offeredoutstanding performance (see the sidebar, "RAIDPerformance and NT"), and disk utilization never maxed out. Notonly were the data sets smaller than the available space, but according to busutilization data gathered by the monitoring console, there was plenty of diskcontroller bus bandwidth to spare. Disk writes far outweighed disk readactivity, so the controller cache was configured accordingly. The cache modulewas assigned to all RAID volumes equally: the sequential log devices benefitedfrom cache because all writes were immediately acknowledged upon receipt in thecache. The random I/O devices, such as the data volume, did not directly benefitfrom extra cache. Since the test was write intensive, potential read data wasflushed from cache before it was of any use.

Metrics
There was a great deal of data we could havegathered during test runs-in fact, we did a couple with every Perfmon variableturned on as a control data set, but this seriously impacts system performance.We opted for only CPU and memory utilization data from Perfmon, and all otherdata (bus utilization, disk activity, etc.) was gathered passively by theIntelligent Management Subsystem (see the review, "Tricord: A Mainframe'sLittle Sibling," on page 55 of the magazine) monitoring console.

LoadSim records response times for all usertransactions, and you run a statistics utility calledlslog which calculates a final score for the whole test run. We weren'tcollecting data on transactions per second (TPS), because this is onlymeaningful if you are looking for absolute maximum scaleability of a server ortotal environment-which we weren't. You can, if you want, calculate these valuesby looking at transactions per user per day (in our test, about 67 messages permedium-load user per day plus other activities-around 80 transactions total, for1500 users on a two-day run, during an actual four-hour period, so around 17transactions per second), but this doesn't tell us anything since we nevervaried the user load. When finding the breakpoint of a server, and trying tocompare the performance of one server against another, knowing the maximumnumber of TPS the machines are capable of can be a useful metric. However, sincewe never completely maxed out the server (except at one CPU at 128MB of RAM),the TPS values for every run would be the same regardless of the number of CPUsor available memory, because the same number of users were simulated doing thesame things during the same amount of time. The Windows NT Magazine Labwill be looking at these values in the future, since they are important to knowhow a system will scale under varying user loads.

Load and Scaleability Results
Graph 1shows our final response timedata. Based on comparable reports by Microsoft and Compaq, who used the LoadSimtool for similar capacity testing, we believe that these values are bothrealistic and that they reveal some interesting points about Exchangescaleability.

However, there are a couple of mitigatingfactors I should bring up before continuing. First, there were errors during alltest runs (timeouts, etc.), but the error frequency was completely consistentacross test runs: as memory went up, errors went down, no matter how many CPUswere in use. Second, the one data point for a one-CPU/512MB run would seem to beaberrant-however, a straight-line interpolation between the 1024MB and 256MBruns shows that it is within 10% (or 300 milliseconds) of the expected value.The error is just exaggerated on our graph. Third, because we turned the thinktime way down (by using a very short simulated day in order to make faster testruns), the simulated load is artificially high, and probably represents a userload closer to 6000 users instead of 1500 users (we used all of the defaultLoadSim settings for a Medium Usage user, except that we limited the test run tofour hours, with a two hour day. This meant that all of the simulated users'work was compressed into a two-hour period per day, rather than letting it takea full eight hours). The CPU and memory utilization for test runs with 8-hourdays and a 10-hour run were more than 20% lower than those of a 2-hour day and4-hour run, with no lingering message queue after the test completed. So, whatis the ideal test density and think time? That depends on how long you have torun your tests and what you are trying to prove-we were looking for scaleabilityinformation about CPU and memory resources, so dumping as much load on theserver as we could was the only way to get resource utilization high enough tobelieve the numbers we came up with-otherwise, we weren't testing the CPU/memoryengine.

But, even with these few considerations, wecan draw some conclusions about Exchange/NT as a messaging platform for yourenterprise. We found that over 128MB of system RAM isn't necessarily a goodthing, because response times went up for each step in memory we made with twoand four CPUs-response time did drop with additional memory on the one-CPU runs.

We can also conclude that throwing more CPUsat Exchange is a worthwhile expenditure-more so than memory, because responsetimes dropped significantly with the extra processors. There is a point ofdiminishing returns-the jump in performance from two to four CPUs was not asgreat from one to two CPUs-but it depends on how much you are willing to pay forthat little performance boost.

Lookingat the big picture from the tests we ran, an optimal configuration for asingle-function application server (in other words, your server is only runningExchange, or SQL Server, or another BackOffice application) is two CPUs with256MB of RAM. This configuration offers the best combination ofperformance (almost twice that of one CPU) and price. However, remember thatmuch of your server's capacity will be taken up by Exchange if you are runningthis many users, so if you want to add users or add an application, more CPUswill probably be necessary. More memory can also improve server capacity foradditional applications.

Memory and CPU Utilization
Our test results bring up questions about howExchange is optimized, and what is actually going on behind the scenes on yourserver. Does Exchange take advantage of all available memory? Are all of yourCPU resources being used, or is there hidden bandwidth that you can takeadvantage of?

The LoadSim tool is not a complete answer whenanalyzing client/server performance. You also need to look at what your serveris doing, rather than just at transaction processing times. NT's Perfmon is anideal tool for looking into specific aspects of your network, client, and serverperformance. While our tests were running, we used Perfmon to record CPU andmemory utilization information to find clues about why the server was performingthe way it was (see Table 2, and Screens 1 and 2).

As you may know, NT does not scalelinearly-3.51 didn't, and neither does 4.0. Now, this isn't NT's fault: If youare running a uniprocessor system at 90% CPU utilization, adding another CPUdoesn't double system performance-you'll have two CPUs running at 50-60%, nottwo CPUs running at 90% under the same load. Additional CPUs add capacity, andcan improve performance, but it is not a geometric relationship (you can'tmultiply system performance by the number of CPUs you add), and it is bounded,as you can see in Graph 1.

Table 2: Total CPU Utilization (averaged over all CPUs in use)

	Memory (MB)
#CPUs	128
1	100%
2	92
4	69

Memory does not have a huge effect on CPUperformance in Exchange, and only caused a minor variance in utilization levels.One possible reason for this variance (and for the increasing response timeswith additional memory), other than optimization characteristics of Exchange, isthat as you throw more memory at NT, the system needs more CPU resources tomanage it-although, SQL Server has not exhibited this in other tests we have runso far. Also, there seems to be a direct correlation between CPU utilization andthe response times-as response time goes up, so does utilization, but CPU usebehaves unpredictably relative to total system memory (Table 2 shows how the utilization values jump around at different memory levels, but always within a close range-processor interrupts also increase as memory decreases).

Memory usage is a different story. Exchangedoes not release memory once it has used it, probably because it has beenassigned permanently (until a system restart) to the Exchange buffer pool (4KBbuffers)-but, strangely, Exchange does not make profitable use of large amountsof extra memory when it's available (such as when we had 1GB of RAM on thesystem). I would expect that on a system with only 128MB of RAM, turning thenumber of assigned buffers up to the level they are on a 1GB system would offervery similar performance, but would leave very little headroom for adding usersor other applications.

Disk Utilization and Performance
The Tricord PowerFrame has a very fast disk I/Osubsystem, but it can't change the way NT and Exchange behave. You can minimizecertain performance degrading effects by choosing the right disk configuration,such as an appropriate RAID level, number of drives, and so forth (see thesidebar,"RAID Performance and NT"), butyou also have to know what is going on to deal with the problems and tune aroundthem.

Once again, Perfmon (a combination of theDiskperf utility you can enable with the command DISKPERF -Y from a DOS Commandprompt ) can be invaluable. The Tricord management software chose the right diskconfiguration for us, without affecting system performance, because the softwarerecords data passively off the system bus and passes the data via a serial linkto a secondary workstation.

The mix of disk I/O activity changed accordingto the amount of system memory installed (see Table 3), less so according toavailable CPUs. As memory increased, disk writes (vs. reads) changed from 60%(at 128MB of RAM) to 97% (at 1024MB of RAM). Looking at the activity as systemmemory is reduced, we find that the remaining message queue at the end of thetest reaches unreasonable levels-this result means that if your Exchange serverdoesn't have enough RAM, you end up with many messages not being sent untilafter the current load decreases (this process can take up to an hour on aheavily loaded server!), which can affect people who rely on quick delivery.Also, if you don't have enough memory, a high-usage server will end up queueingmore and more messages to disk, and rereading them into memory-instead of merelylogging the activity to disk-when they come to the top of the stack again. So,without a fast disk subsystem, as this read activity goes up, system performancewill drop, whereas with more memory, more of this activity is cached directlyfrom memory. One mitigating factor to all of this is that disk activity is notreally that heavy-only on the order of 200KB per second, not MB per second asyou would expect on a heavily loaded database server. The Tricord system hadplenty of disk bandwidth left over for heavier I/O activity, such as publicfolders and groupware applications inside Exchange.

Paging activity on all tests was low and sodid not significantly affect system performance, but it minimally increased asmemory decreased.

Table 3: Resource/Performance Relationships

Factor	Increases	Decreases
More total system memory	Response time	Processor interrupts
	Exchange buffers	Disk I/O read activity
	Disk I/O write activity	EOT* message queue length
Additional CPUs		Response time
		CPU utilization

*EOT = End of test

Conclusions
We've opened up a big can of worms here, and Ithink I may have inadvertently stepped on a couple of them. As far as ExchangeServer is concerned, our test results beg the questions, What is the Exchangemarketplace? Will it displace existing enterprise UNIX and mainframe messagingsystems, or is it aimed merely at migration from MS Mail or other smaller-scaleapplications? There are corporations that are currently engaged in rolling out250,000 Exchange clients-this rollout would seem to indicate that Exchange isindeed an enterprise-scale application. However, the scaleability issues weuncovered, such as memory usage, ultimate user response time, and userauthentication (logon) considerations show that work remains to be done on theserver-side software components. NT and Exchange do scale-as do the hardwareplatforms they run on-but it comes at a price.

The moral of the story: Know your users, knowyour workload, know your hardware. Analyze your system performance with allavailable tools, such as LoadSim, Perfmon, Network Monitor, and any others youcan find.

The issues of client/server computing arefabulously complex and frequently expensive to deal with, and they aren't goingto get any simpler. User-side issues may get easier, but the administration, thecapacity planning, the performance testing, and other MIS issues are only goingto become more difficult to grasp as time goes on. As technologies such asclustering, faster server hardware, larger computers, new operating systems, andstuff we haven't even thought of yet come into play, testing will become morecomplex..

Bear with us. It's gonna be a wild ride.

For more information, read:
Performance: Concurrent Users Per Server (www.microsoft.com/exchange/evalgd/upswpfnl.doc)

LoadSim: Tool usage documentation (on MSExchange Server CD)

Compaq: Performance of MS Exchange Server 4.0on Compaq ProLiant Servers (www.compaq.com/support/techpubs/whitepapers/444a0696.html)

Microsoft: MS Exchange Deployment ConferenceNotes 1996 (contact Microsoft at 206-882-8080 or on the Web, www.microsoft.com)

About the Author

Joel Sloss

See more from Joel Sloss

Related Topics

Recent in Cloud

Related Topics

Recent in OS

Related Topics

Recent in IT Mgmt

Related Topics

Recent in Career

Related Topics

Recent in Storage

Related Topics

Recent in Security

Related Topics

Recent in Dev

Related Topics

Recent in DX

Related Topics

Recent in Infrastructure

Related Topics

Optimizing Exchange to Scale on NTOptimizing Exchange to Scale on NT

About the Author

Editor's Choice

Featured Technical Explainers

Recent What Is

Related Topics

Recent in Cloud

Related Topics

Recent in OS

Related Topics

Recent in IT Mgmt

Related Topics

Recent in Career

Related Topics

Recent in Storage

Related Topics

Recent in Security

Related Topics

Recent in Dev

Related Topics

Recent in DX

Related Topics

Recent in Infrastructure

Related Topics

<span class="ArticleBase-LargeTitle">Optimizing Exchange to Scale on NT</span>Optimizing Exchange to Scale on NTOptimizing Exchange to Scale on NT

About the Author

Editor's Choice

Featured Technical Explainers

Recent What Is

Optimizing Exchange to Scale on NTOptimizing Exchange to Scale on NT