LoadSim Revealed: Scientific Method to the Rescue

LoadSim can tell you a great deal about your server's performance, what an optimal configuration is, and what load your server can support.

Joel Sloss

October 31, 1996

9 Min Read
ITPro Today logo

Microsoft provides a handy little tool withExchange Server 4.0 called LoadSim (as seen in Screen A), which functions as aload generator and user simulator for capacity testing a messagingplatform-specifically, Exchange. It runs on one or more client machines intandem, sending and receiving messages, accessing public or private folders,etc., as it emulates the activities of a normal Exchange user.

While LoadSim was intended to be a capacityplanning tool (to find out how many users you can support on a system, with whatkind of response times), it also makes an excellent performance testing tool ifused properly. However, LoadSim is not without problems. If you are aware ofthem, such as client dependencies, a quirky user interface, and sometimesunpredictable behavior, you can use it to test existing systems-or find out whata new one will do for you-by planning your testing strategy around these holes.In the Windows NT Magazine Lab, we decided that LoadSim would make anexcellent first step in testing server hardware as messaging platforms-we cantune the system configuration (number of CPUs, amount of memory, disk andnetwork layouts, etc.) and change the user load (number of users, transactionmix) to come up with curves that tell a more complete story about a particularmachine. Instead of a single number characterizing the performance of an entireclient/server system, we can use these curves to find trends and breakpoints ofvarious types of systems.

Know Your Enemy
First, lets look at the problems we know about.Client dependencies in LoadSim are fairly significant-the horsepower of theclient system has a large bearing on measured response times. LoadSim is morememory constrained than CPU constrained, but even with a large amount of memory,the client falls down on high user counts. Besides, you have to do what's real -you can't simulate 1000 users on a single physical client system, because itintroduces new dependencies at the client level that you are trying toavoid-actually, it introduces dependencies that you are trying to measure on theserver! With too high a user count, whether the CPU is fully taxed and memory isoptimal or not, the I/O capabilities of the client system get in the way. Withan appropriately fat client, you can simulate a certain number of users andattain the same throughput for each one (within an acceptable tolerance) as youwould having a separate physical machine for each client. If you go too far, youhit bottlenecks in the client such as network bandwidth, memory, CPU, and diskutilization, etc., that warp your results.

When we set up our testing environment for theTricord review, we ran tests using a maximum configuration on the server (fourCPUs, 1GB of RAM), while varying the number of users simulated on a singlephysical client system. We found that the response time didn't startdegenerating noticeably until we went above 100 users (that is, the responsetime at 10 users was within 10%-15% of that for 100 users). Also, other vendorssuch as Compaq, and even Microsoft, have performed similar tests in a comparableenvironment to the one we used, and came up with the same results for clientload. We also tuned the user load and think times (how long the pause is betweenuser operations) to values between absolute "real world"-which is aneight-hour day with long breaks between actions-and a livable testingenvironment that wouldn't take 24 hours to get a single data point. We ended upwith a two-hour day, and a four-hour test run, which neither overwhelmed theclient system, nor represented an unrealistic environment. We took data pointsfrom the two middle hours (the last half of the first day and the first half ofthe second day), so that the ramp-up time (the first hour) for the test to reachsteady state did not influence the results, nor did the ramp-down as the userslog off.

Since we could operate within a reasonablerange of real world results, and keep the test believable and repeatable, wedetermined that LoadSim was a good starting point for messaging tests. But whatabout the other problems I mentioned, like the inconsistent interface andunpredictability, which would seem to contradict using this tool at all?

The interface is a resolvable issue-it justtakes a little babysitting of the test runs. The utility itself follows thetypical Microsoft GUI guidelines (rather than being a command-line interface),but the error trapping is a little weak, so restarting the tool or reloading aset of test parameters can change test settings. Before each run, we had todouble check every system to make sure that it was going to run the test weintended.

Unpredictability is a little more difficult todeal with, and it is a two-fold problem. First is the unpredictability of theinterface, which I just explained. Second is the unpredictability of the testresults. On the one hand, LoadSim is a fine end-to-end testing environment,while on the other hand you don't really know what it is measuring, and can onlyinfer certain things by analyzing the results against server operations (such asdisk and CPU utilization). There is a narrow band of settings in the test, aswell as a specific hardware configuration on the server, that seems to giverelatively error-free logs (see the section on load and scaleability in the mainarticle). A test run isn't necessarily invalid if there are errors-it justpoints to bottlenecks in the system.

I say that you don't know what LoadSim isreally measuring, because response times behave in an odd way when compared toserver configuration. On a server with lots of memory, the response time goes up(which is a bad thing). With less memory, the response time drops (which is agood thing), but the message queue at the server is incredibly long, and doesn'tfinish processing messages until long after the test has actually completed. So,are you measuring user response time (i.e., how long it takes for the interfaceto return control of the system to the user so that he/she can send anothermessage), or are you measuring total message processing times (server latency)?

Exchange seems to behave such that if theresources are available, it uses them to the best of its abilities. If theyaren't, Exchange holds things back (it queues them up), such as outgoingmessages, until the proper resources are again available. In our tests, we sawas much as an hour of post-processing after a run with one or two CPUs and 128MBof RAM.

After a few test runs, we knew some of whatwas going on behind the scenes, and could account for certain values in theresults. For your tests, now that you know what some of the issues are, you candeal with the problems at the start: The errors will make more sense, or you caneliminate them entirely.

The Tool
The LoadSim tool itself is easy to use once youknow what you are looking for. However, don't go to Microsoft asking for supportor waste time searching for extensive documentation, because there is none.Microsoft provides it as a "use it at your own risk" utility, and willnot support your efforts with it. There is meager support documentation on theExchange Server distribution CD, but that's about it.

LoadSim lets you tune a test run in a varietyof ways. You can use any number of physical clients, and simulate any number ofusers performing a wide range of operations-simply install the tool on eachsystem you intend to use. When you enter all of the names of the client systemsin the Configuration/Client Machines dialog, a user list is generated based onall available systems, which you then import into the Exchange server.

You can set test parameters for user level(high, medium, or low usage-representing the number of transactions in a day),what the users will do (send, receive, access folders, etc.), how long a test is(think time), what length day and night are, and a great deal more-thesesettings are saved to .SIM files for later recall. We used the default settingsfor everything but the length of the day and the overall test, so you should beable to reproduce our tests almost exactly.

While a test is running, all statistics andmessages are displayed in a console window on the client systems. You can seecurrent response time (shown as "score"), message types, activities,and errors, and current test status (total time, current user count, etc.). Thisdata is logged to a file for use by the lslog utility which actuallycalculates test results.

The ISLOG.EXE program can truncate data toeliminate ramp-up and ramp-down periods in the test, concatenate log data setsfrom multiple client systems, and determine the 95th percentileresponse time from the steady-state period of the test run (the response timethat represents 95% of all transactions while all users were logged on and theactivity was at its peak). You use this data to plot as you see in Graph 1.

Errors
Our tests were not error free (as you can seeby the one bogus data point in Graph 1). We found network dependencies rooted in either how the server was handling authentication requests (since itwas the domain controller, too) and/or processing network packets, as well aspossible problems in the network configuration. Either the server, the networkhardware, or Exchange itself couldn't support 1500 simultaneous client logons,so Exchange choked, giving Messaging API (MAPI) errors and connection timeouts.

Possible reasons for this behavior could benetwork collisions, packets arriving or being processed out of order, or thefact that the network was simply overloaded. MAPI errors and such only occurredwith all of the clients going at the same time with a low memory configurationon the server. Adding more memory to the server minimized these effects, andreducing the number of simultaneous logons by staggering the client startupprocedure helped a great deal (although too much of a delay between clientinitializations caused problems with LoadSim because it couldn't find the otheremail accounts). Lowering the total user count and turning up the think-time(increasing the length of the simulated day) also reduced the error frequency,message queue length at the end of the test run, and memory and CPU utilization.

Is this a network hardware problem or a memoryvs. network I/O processing problem? The errors-both symptoms andsolutions-point to server and NT limitations rather than all being attributed toa physical network bottleneck, since most errors occurred at testinitialization, rather than during the test run.

The Big Question
Can LoadSim help you capacity test yoursystems, and does it answer the burning question of NT scaleability? Withoutwaffling too much, I can say yes. Properly used, LoadSim can tell you a greatdeal about your server's performance, what an optimal configuration is, and whatload it can support.

We will continue to use LoadSim forperformance testing servers in the Windows NT Magazine Lab, and we'll beable to analyze a number of factors according to the load we use: from absolutecapacity to best hardware for messaging. Stay tuned for the latest data andstress tests.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like