Load Balancing and Scaling Your WCF Services

WCF Design and Configuration Recommendations for Distributed Environments

14 Min Read
ITPro Today logo in a gray background | ITPro Today

In previous installments of this column for asp.netPRO I've discussed topics related to this discussion, such as "Concurrency and Throttling Configurations for WCF Services," "Proxies and Exception Handling," and "WCF Proxies: To Cache or Not to Cache?" This month's column explores the impact of these WCF features on load balancing and scalability scenarios.

Deploying WCF services in a distributed environmentrequires developers to be aware of certain design issues and configurationsettings that can impact performance and scalability. IT staff is not likely tobe educated in all things WCF, thus it is up to the developer to bridge the gapbetween IT and development probing into the logistics of productiondeployment. In this article I ll review WCF considerations for distributedscenarios, specifically related to load balancing and scalability. I ll startby providing some guidance on the service-level goals you should be lookingfor, then dive in to WCF specifics, including the affect of WCF sessions,binding considerations, channel creation overhead, how approaches differ forclient-server compared to server-server scenarios, counting service hops, andthrottling.

Performance, Throughput, and Scalability

Before diving in to the WCF specifics, it helps tounderstand the difference between performance, throughput, and scalabilitythree important features of a distributed environment.

Performance refers to the time it takes to complete arequest; it often is measured in two ways:

  • Execution time refers to the time it takes to complete a request from the time it reaches the server machine to the time the last byte is returned to the client. This measurement does not take into account latency between the end-user machine and the system.

  • Response time refers to the time it takes to complete a request from the user s viewpoint or, perceived performance. This measures from the time the request is issued to the time the first byte is received.

Clearly you have more control over execution time, whichis influenced by available server resources (CPU, memory, disk IO) andcommunication overhead between moving parts of the application (for example,crossing process or machine boundaries).

Throughput refers to the number of requests per unit oftime, usually measured in requests per second. Once again, server resources andcommunication overhead can influence this number, in addition to any throttlingconfigurations that limit concurrent calls. Throttling is important because itprevents server machines from being maxed out, which can cause catastrophiccrashes that prevent new requests from being processed altogether.

Scalability refers to the system s ability to provisionnew resources without impact to the application. This can mean adding newserver machines that the application seamlessly uses (horizontal scaling), oradding resources to a particular machine to handle more load (verticalscaling). Typically, vertical scaling has no impact on application design andconfiguration, but horizontal scaling can be a problem if applications haveserver affinity as in sessions depending on how the system is configured tomanage that affinity.

The table in Figure 1 summarizes some common Service LevelAgreement (SLA) goals for performance, throughput, and scalability. Of course,real-time applications may have stricter requirements for performancespecifically, but these benchmarks are numbers that are usually satisfactoryfor a service-oriented application.

 

Measurement

Target SLA Goals

Performance

< 2 seconds average request execution time

Throughput

150-500 requests/second on a single server with 4 CPU depending on the overhead of each request

Scalability

Ability to add new servers with minimal configuration effort

Figure 1: SLAgoals for performance, throughput, and scalability.

Now I ll take a look at the WCF features that caninfluence these measurements.

Load Balancing

Horizontal scaling implies distributing load acrossmultiple server machines in a load-balanced environment. When multipleconcurrent requests are received, they are typically distributed among theavailable machines, usually with a round-robin approach, or (better) by analgorithm that determines which machine has the least active requests beingprocessed. Software load balancers (such as Network Load Balancer, or NLB) orhardware load balancers (appliances such as a Cisco router) are usuallyresponsible for the algorithm used to distribute load.

In theory, the best situation is for requests to be freelydistributed to the most available machine but sessions usually get in the wayof this freedom. For WCF services, transport sessions such as TCP require loadbalancers be configured for sticky IP , while application sessions, reliablesessions, and secure sessions require sticky session configuration. Failovera situation where if one machine fails, another machine can pick up the sessionwhere it left off is not built-in for WCF services, although applicationsessions can fail over if the service is a durable service.

Load Balancing and Bindings

There are several binding features that influence theability to load-balance services. Here is a short list of standard bindings andthe features that require consideration in a load-balanced scenario:

  • NetTcpBinding. This binding requires sticky IP behavior so that clients are returned to the same machine where the socket is. Aside from the socket, WCF also depends on clients returning to the same communication channel.

  • BasicHttpBinding, WebHttpBinding. These bindings result in stateless communication channels by default; however, the default behavior is to enable HTTP Keep-Alive, which can result in server affinity. To disable Keep-Alive, a custom binding must be created and the KeepAliveEnabled property set to false (see the downloadable code sample for an example of this).

  • WS[2007]HttpBinding, WS[2007]FederationHttpBinding, WSDualHttpBinding. These bindings all have secure sessions enabled by default. They also all support reliable sessions with the latter binding, requiring reliable sessions to be enabled. Both of these features require sticky sessions to ensure that requests get back to the same machine, with the same server channel. These bindings also enable HTTP Keep-Alive by default, but there is no point in disabling Keep-Alive unless sessions are disabled for the binding.

  • NetTcpContextBinding, BasicHttpContextBinding, WSHttpContextBinding. These bindings are context-aware equivalents to a few of the bindings already discussed. That means, in addition to features already mentioned, they are also context-aware. Context-aware bindings support durable services and workflow services both of which rely on a database to store state and rehydrate instances if a message is received to an alternate machine. In fact, these bindings require sticky sessions to maintain the same channel between client and service, but if a new client channel is created, it can pass an existing context and successfully construct the service in its current state on another machine.

In summary, HTTP bindings that do not have HTTPKeep-Alive, secure sessions, and reliable sessions enabled can be effectivelyload balanced within the context of the same client proxy. The remaining bindingshave server affinity for the lifetime of the channel. Is this bad? Notnecessarily. Although greater scalability can be achieved if new requests arealways passed to a server with the most resources, load balancers also can lookat the number of sessions living on a particular server and distribute newsessions to those servers with fewer sessions. For the benefits that sessionsbring, this is usually an acceptable cost.

Load Balancing and Sessions

Allow me to further elaborate on the impact of varioustypes of sessions on load-balancing scenarios. To begin, Figure 2 illustrates ascenario where an HTTP binding without session or Keep-Alive settings enabledis in use. Each call from the same proxy will be sent to any available machineaccording to the load balancer s algorithm. Each operation is designed tomanage its own state, and the service is designed as a PerCall service (nostate).

 


Figure 2: Load balancing withoutsessions.

 

When the service is a PerSession service (an applicationsession is present), the in-memory state of the service relies on each callfrom the same proxy returning to the same machine (sticky session), as shown inFigure 3. If, on the other hand, the service is a durable service, servicestate is stored in a database between calls (see Figure 4). That means thatsubsequent calls can be received by a different machine and can properlyinitialize the service to its current state.

 

Note: I avoid usingsessions in WCF services and prefer the model where each method independently managesits own state in a custom database for the application.

 


Figure 3: Load balancing andapplication sessions.

 


Figure 4: Load balancing and durableservices.

 

WCF services that support transport sessions (TCP) or thatsimulate transport sessions with other protocols (reliable sessions or securesessions) also require sticky IP or sticky session configuration (see Figure 5).Once again, for the lifetime of the client, channel requests must be directedto the same server channel (where the session lives). If the channel fails oneither side, a new session must be created but unlike application sessions, anew transport session can be established without impact to the clientapplication as no application state is lost. An exception to this might be if reliablesessions are used to send a large message in smaller parts in which case theentire message will likely need to be re-sent.

 


Figure 5: Load balancing andtransport sessions.

 

Proxy Lifetime

I ve described proxy lifetime issues in a past column;however, it is important to revisit this topic in the context of thisdiscussion. There are two key scenarios to consider: client-server andserver-server.

In a client-server scenario, a Windows client applicationuses a proxy to call WCF services. The proxy usually has a lifetime for as longas the client application is running. Calls from the same proxy instance willhave server affinity if a session is present. In terms of scalability, thesystem would still be able to distribute calls from different clients (proxies)among load-balanced servers.

In a client-server scenario, the presence of a session hastwo important considerations that can impact performance: channel creationoverhead in the event the application is multithreaded, and exception managementwhen something happens to tear down the underlying client or server channel.

Because channel creation is expensive, if the clientapplication is multithreaded it can have serious impact on the perceivedapplication performance if each thread creates its own proxy to call a service.Even though .NET 3.5 introduced automatic channel factory caching features tooptimize channel creation, it is better to cache the actual channel (the proxyreference) in a client-server scenario, and share that among threads.

Note: I wrote aboutchannel caching options and .NET 3.5 features in the July 2008 column.

If the channel has a transport session (not an applicationsession), it is best if the service allows multiple concurrent requests to thesame channel. For this the service must be configured to support multipleconcurrent calls even if it is a PerCall service:

[ServiceBehavior(InstanceContextMode =

 InstanceContextMode.PerCall,ConcurrencyMode =

 ConcurrencyMode.Multiple)]

public class PerCallService:IPerCallService

For services with InstanceContextMode PerSession or Singleit may be best to leave ConcurrencyMode as its default value, Single. This way,only a single thread can access the shared service instance. In the case ofPerCall, each thread always gets its own service instance, which means you arereally only allowing multiple threads access to the server channel, not to thesame service instance.

As for exception management, in the presence of sessionsone must remember that an uncaught exception or timeout can put the channelinto a faulted state rendering the proxy useless. If the channel did not havean application session, it is likely that the user doesn t care about theexception and you should create a new channel in stealth mode. If anapplication session was in progress and the service is not durable, the usershould probably be notified of the failure before constructing a new channel.

Note: I wrote aboutexception handling techniques for this scenario in the January 2008 column.

In a server-server scenario you may have an ASP.NETapplication or another WCF service living in the DMZ calling downstreamservices. In this scenario, proxy lifetime management should be handleddifferently. You should never cache the channel and share among threads as thiswould create potential server affinity when calling downstream services. Thoughinitially this may give the illusion of throughput, as the number of users andconcurrent threads increase you quickly see a cap on throughput. You may beable to cache the channel factory (something that .NET 3.5 can handle for you)if the same credentials are used for all callers; for example, if a certificateis used to authenticate to downstream services. This doesn t work for scenarioswhere you must attach a supporting token for each call such as one thatrepresents the initial caller and their roles. In that case a new channelfactory and channel (proxy) must be created for each call.

Because the channel will not be cached, this scenario doesnot have concern for session expiry and exception handling for faultedchannels.

Limiting Service Hops

In a distributed environment it is likely there are atleast two service hops in the context of a single request thread. Because theserver-server hops are likely to include the overhead of constructing a newchannel for each call, this can quickly add too much overhead to the call chainof the request thread. Generally speaking, it is a good idea to stick to two orthree service hops for a single request thread. Anything beyond this should becarefully benchmarked to make sure it yields good enough performance to meetSLA requirements. Remember that your goal should be to achieve the necessaryperformance for the application while still benefiting from a service-orientedapplication design.

Instance Throttling

It is important to allow the right number of concurrent requestsand sessions for your WCF services. Allowing too many requests and sessions ona single machine can cause it to fail, but allowing too few limits the serverfrom realizing its potential throughput. WCF provides a defaultServiceThrottlingBehavior for each service to control the number of concurrentrequests, sessions, and service instances, as follows:

 

   

     

      maxConcurrentInstances="2147483647"

       maxConcurrentSessions="10" />

   

 

The setting for maxConcurrentCalls controls how manyconcurrent threads will be allowed for the service type. A good number to startwith for this is 30 concurrent calls which is similar to the default numberof thread pool threads allocated by the ASP.NET runtime.

The setting for maxConcurrentSessions controls how manyconcurrent transport sessions can be created for the service type. This numbershould definitely be increased so that greater than 10 clients can connect tothe service within a particular host process. Limiting to 10 means that only 10TCP sessions, reliable sessions, or secure sessions are allowed whicheffectively limits the number of clients that can initialize a proxy tocommunicate with the service. This number should be estimated based on theusage patterns of application users. For example, if all users will log in tothe application every morning in a corporate environment, you can expect thatnumber to be distributed across load-balanced machines. On the other hand, ifonly a percentage of users are usually online concurrently, the collectivenumber across load-balanced machines can be significantly reduced below thenumber of application users.

The setting for maxConcurrentInstances will naturally bethrottled by the other two settings, so under most circumstances you can leavethis value alone.

Conclusion

Developers should consider the impact of load balancingand scalability in their service design by doing the following:

  • Disable HTTP Keep-Alive to remove server affinity for simple HTTP bindings.

  • Make services durable if application sessions are supported.

  • Cache proxies in multithreaded client-server scenarios and silently recreate proxies as needed when channels are faulted.

  • Cache the channel factory if possible for server-server scenarios.

  • Allow multiple threads access to PerCall services to support multithreaded clients.

  • Try to keep service hops to two or three per request thread and benchmark as hops are added to verify good enough performance can be achieved.

IT should consider the following:

  • Configure load balancers for sticky IP or sticky sessions as needed where sessions are supported.

  • Ensure throttling configuration is sufficient for application throughput.

  • Monitor performance counters to ensure that performance and throughput results are meeting SLA requirements, adjusting configurations as necessary.

Another feature of load balancing worth discussing is howto configure WCF to work effectively with Big IP/F5 servers that process SSLand forward unencrypted messages to the service. This is an advanced subjectthat deserves an article of its own, so I will address this in next month scolumn.

Download the samplesfor this article at http://www.dasblonde.net/downloads/aspprodec08.zip.

 

Michele LerouxBustamante is Chief Architect of IDesign Inc., Microsoft Regional Directorfor San Diego, and Microsoft MVP for Connected Systems. At IDesign Micheleprovides training, mentoring, and high-end architecture consulting servicesfocusing on Web services, scalable and secure architecture design for .NET,federated security scenarios, Web services, interoperability, and globalizationarchitecture. She is a member of the International .NET Speakers Association(INETA), a frequent conference presenter, conference chair for SD West, and isfrequently published in several major technology journals. Michele also is onthe board of directors for IASA (International Association of SoftwareArchitects), and a Program Advisor to UCSD Extension. Her latest book is Learning WCF (O Reilly, 2007); visit her bookblog at http://www.thatindigogirl.com.Reach her at mailto:[email protected], orvisit http://www.idesign.netand her main blog at http://www.dasblonde.net.

Additional Resources

Read more about:

Microsoft
Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like