Concurrency and Throttling Configurations for WCF Services
Control the Number of Concurrent Requests to Each Service
October 30, 2009
RELATED: "Load Balancing and Scaling Your WCF Services" and "Proxies and Exception Handling"
In my last column I explained instancing modes for WCFservices in "WCF Service Instancing," which control the lifetime of each individual service instanceallocated to a request thread. Although PerSession instancing is the default, PerCallinstancing is the preferred setting for server deployment that must support alarge number of client requests. This month I ll discuss other settings thatalso influence overall throughput to your WCF Web services.
Scalability and Throughput Features
Scalability and throughput requirements of services hostedon a client machine, versus those deployed to server environments, are not equal.Services hosted in-process are initialized and invoked on demand; those hostedon client machines at best may be consumed by multiple client threads. Servicesdeployed to server machines either Web servers exposed to the Internet orservers behind the firewall that satisfy intranet clients can expect to servea significantly higher number of concurrent requests. The number of requestsmay be predictable if the number of clients is controlled, or may increase inexponential proportions due to a much wider client-base with potential forcontinued growth.
Ideally, your services will always be ready to processincoming requests and juggle the expected load, while not maxing out hostmachine resources and crippling the system. WCF features that support this needinclude instancing mode, concurrency mode, and throttling behaviors. As Idiscussed in WCF Service Instancing , instancing mode controls the lifetime ofeach service instance, letting you allocate an instance per call, per session,or a single instance for all clients. Concurrency mode controls how and if eachindividual service instance allows concurrent calls, which can affectthroughput. Throttling behaviors allow you to control the request load to eachservice, restricting the number of concurrent calls, the number of sessionsallocated, and the number of service instances.
Concurrency Mode
Concurrency issues arise when multiple threads attempt toaccess the same resources at run time. When requests arrive to a service, theservice model dispatches the message on a thread from the thread pool.Certainly, if multiple clients call the same service, multiple concurrentrequest threads can arrive for a service. The particular service objecthandling each request is based on the instancing mode for the service. ForPerCall services, a new service object is granted for each request. ForPerSession services, the same service object receives requests from the sameclient (or, proxy). For Single instancing mode, all client requests are sent tothe same singleton service object. Based on this alone, PerSession services areat risk of concurrent access when the client is multithreaded, and Singleservices are perpetually at risk.
The concurrency setting for a service is controlled by theConcurrencyMode property of the ServiceBehaviorAttribute. By default, only onerequest thread is granted access to any service object, regardless of theinstancing mode; this is because the default setting for ConcurrencyMode isSingle, as shown here:
[ServiceBehavior(ConcurrencyMode=ConcurrencyMode.Single)]
public class MessagingService : IMessagingService
This property can be set to any of the followingConcurrencyMode enumeration values:
Single. A single request thread has access to the service object at a given time.
Reentrant. A single request thread has access to the service object, but the thread can exit the service to call another service (or client callback) and reenter without deadlock.
Multiple. Multiple request threads have access to the service object and shared resources must be manually protected from concurrent access.
The following sections briefly describe each mode, anddiscuss their relevance to Web service deployments.
Single Concurrency Mode
By default, services are configured for Single concurrencymode. This means that a lock is acquired for the service object while a requestis being processed by that object. Other calls to the same object are queued inorder of receipt at the service subject to the client s send timeout or theservice s session timeout, if applicable. When the request that owns the lockhas completed, and thus released the lock, the next request in the queue canacquire the lock and begin processing. This configuration reduces the potentialthroughput at the service, when sessions or singletons are involved, but italso yields the least risk for concurrency issues.
Configuring services for Single access doesn t impactPerCall services because a new service instance is allocated for each request,as shown in Figure 1.
Figure 1: PerCall instancing modewith Single concurrency.
For PerSession services, Single concurrency disallowsmultiple concurrent calls from the same (multithreaded) client, while notimpacting throughput of multiple clients (see Figure 2); for Single instancingmode, only one request can be processed across all clients (see Figure 3).
Figure 2: PerSession instancing modewith Single concurrency.
Figure 3: Single instancing modewith Single concurrency.
As I ve said, when you expose WCF services over HTTP as Webservices, chances are you ll be using PerCall configuration. Sessions for WCF Webservices are usually better facilitated by persisting data between calls to adatabase, rather than using an application session (which is not durable). Thatmeans the default concurrency mode setting of Single will not reduce thepotential throughput of requests to your application.
Reentrant Concurrency Mode
Reentrant mode is necessary when a service issuescallbacks to clients, unless the callback is a one-way operation. That sbecause the outgoing call from service to client would not be able to return tothe service instance without causing a deadlock. This mode is also necessarywhen services call out to downstream services, which implies returning to thesame service instance.
Services configured for Reentrant concurrency mode behavesimilarly to Single mode, in that concurrent calls are not supported fromclients; however, if an outgoing call is made to a downstream service or to aclient callback, the lock on the service instance is released so that anothercall is allowed to acquire it. When the outgoing call returns, it is queued toacquire the lock to complete its work. Figure 4 illustrates how PerCallservices would behave with and without reentrancy for non-one-way callbacks. Inthis case, the only thread that might need to reenter the service is likely anoutgoing callback. Likewise, if the service were to call services downstreamthat later attempted to call back into the top-level service, reentrancy would allowit (however, it is poor design to have circular service references).
Figure 4: Comparing PerCallinstancing mode with Single or Reentrant concurrency on non-one-way calls.
Because each request thread gets its own service instance,callbacks are the primary scenario that applies to your PerCall Web services.Thus, if you are using WSDualHttpBinding and your callbacks aren t one-way, you llset the concurrency mode to Reentrant. You should also pay close attention tocalls to downstream services that may need to call back to upstream services.
Multiple Concurrency Mode
Services configured for Multiple concurrency mode allowmultiple threads to access the same service instance. In this case, no locksare acquired on the service instance and all shared state and resources must beprotected with manual synchronization techniques. This setting is useful forincreasing throughput to services configured for PerSession and Singleconcurrency mode.
Instance Throttling
To increase throughput at the service, multiple concurrentcalls must be allowed to process. PerCall services can support multipleconcurrent calls by default because each call is allocated its own serviceinstance. PerSession and Single mode services can allow multiple concurrentrequests when configured for Multiple concurrency mode. However, regardless ofthe concurrency mode, server resources are not generally capable of servicingan unlimited number of concurrent requests. Each request may require a certainamount of processing, memory allocation, hard disk access, network access, andother overhead.
WCF provides a throttling behavior to manage server loadand resource consumption (with the following properties):
MaxConcurrentCalls. Limits the number of concurrent requests that can be processed by all service instances. The default value is 16.
MaxConcurrentInstances. Limits the number of service instances that can be allocated at a given time. For PerCall services, this setting matches the number of concurrent calls. For PerSession services, this setting matches the number of active session instances. This setting doesn t matter for Single instancing mode, because only one instance is ever created. The default value for this setting is 2,147,483,647.
MaxConcurrentSessions. Limits the number of active sessions allowed for the service. This includes application sessions, transport sessions (for TCP and named pipes, for example), reliable sessions, and secure sessions. The default value is 10.
Each of these settings is applied to a particular serviceconfigured through its ServiceHost instance (associated to the .svc file whenhosting with IIS or WAS). To set these values declaratively you associate aservice behavior and add the section. Figure 5 showsa service behavior with the default throttling values.
behaviorConfiguration="serviceBehavior"> contract="Counters.ICounterService" /> maxConcurrentInstances="2147483647"maxConcurrentSessions="10" /> Figure 5: Defaultservice throttling values. The appropriate settings for throttling behavior depend ona number of factors, including the instancing mode for the service, the numberof services exposed by the application, and the desired outcome of throttling.In the next sections I ll discuss throttling in the context of these differentfactors. MaxConcurrentCalls The throttle for MaxConcurrentCalls affects the number ofconcurrent request threads the service can process to any of its exposedendpoints. Regardless if the instancing mode is PerCall, PerSession, or Single,this setting should be approached with the idea of limiting the number of activethreads to a particular service, which allows you to do the math and estimatethe number of requests that can be processed per second. For example, if aPerCall service with one or more endpoints allows 30 concurrent requests, andeach request averages .2 seconds, roughly 150 requests per second can beprocessed by a particular worker process (assuming IIS hosting over HTTP).Multiply the number of worker processes and that number increases for a singlemachine in your Web server tier. If you host two services in the same application, eachallowing 30 concurrent requests, at full capacity 60 concurrent requests canexecute. As you increase the number of services, this can eventually have anegative effect on throughput, as an increasing number of threads increase thecontext switching required to execute them concurrently. For this reason you llwant to consider the potential use of each service alongside the total numberof concurrent threads that are optimal. By the same token, you don t want tolimit the number of concurrent requests to a particular service, such thatqueued requests begin to time out. Now, what I just said about the increased number ofconcurrent requests as you add services to the application applies only to WCFservices that are NOT hosted by IIS or WAS over HTTP. With IIS and WAS hosting,ASP.NET is engaged in the processing of requests, at least to forward therequest to the WCF thread from the ASP.NET request thread. If the call isone-way, the ASP.NET thread is released and the WCF threads will be allocatedaccording to the throttle setting. If the call is request-reply, WCF blocks theASP.NET thread while processing the thread on the WCF thread. That means thatthe ASP.NET processing model is responsible for request throttling fornon-one-way calls. Ideally, you want to reach somewhere between 350 to 500requests per second on a single CPU. You should be able to achieve this byallocating 30 request threads across all services, but this is not a guarantee,as many factors can influence this outcome, including request-processingoverhead and server-machine horsepower. MaxConcurrentSessions Some creativity may be involved in setting the correctthrottle value for MaxConcurrentSessions. That s because sessions live longerthan requests, yet they consume more resources so they have conflictingrequirements. On the one hand, a session lives longer than a request; thus, youdon t want to prevent users from connecting to the system if you can afford toaccommodate them. On the other hand, if the nature of the session is allocatinga large amount of memory (or other resources), the server may only be able toaccommodate so many. The number of active application sessions is traditionallylow compared to the number of users in the system but if you have one millionusers, at 5 percent online, that still means 50,000 sessions might be requestedat a given time. For BasicHttpBinding and WSHttpBinding without reliablesessions or secure sessions, this is a non-issue because sessions are notsupported for these configurations. Thus, the setting for concurrent sessionshas no impact. In the case of outward-facing PerCall services that also supportreliable sessions or secure sessions (via WSHttpBinding), the overhead of thesession is minimal compared to application sessions that could maintain significantstate. These sessions default to a 10-minute expiry, and if your servicereceives close to 300 requests per second, that could mean up to 180,000requests in 10 minutes (some percentage of which are in the same session). Evenat 5 percent, that s 9,000 concurrent sessions that might need to be supportedto allow unique clients to get in the door. The bottom line is that you must bewell aware of the usage patterns of your clients, and make sure you have theright balance to prevent request timeouts (waiting for a new session), whilealso preventing excessive use of server resources. For application sessions or transport sessions used in atraditional client-server scenario, the number of active sessions allowedshould be weighed against the amount of resources consumed by each session.Ultimately, the purpose of the throttle in this case is to prevent the serverfrom maxing out its memory usage, or that of other limited resources consumedby each session. Similarly, downstream services exposed over NetNamedPipeBindingor NetTcpBinding require a transport session that is another resource that hasconfigurable limits on Windows systems. MaxConcurrentInstances The appropriate setting for MaxConcurrentInstances variesbased on the instancing mode for the service. For PerCall services it should beequal to or greater than MaxConcurrentCalls. For PerSession services,MaxConcurrentInstances should meet or exceed MaxConcurrentSessions whereapplication sessions are involved. That s because the value actually limits thenumber of concurrent service instances that can be kept active to supportapplication sessions, which is much different than the number of concurrent,short-lived requests. For singleton services, MaxConcurrentInstances isirrelevant, because only one instance of the singleton is ever created. Conclusion Because your Web services are typically configured asPerCall services over HTTP bindings, you should take from this discussion thatthe default concurrency mode (Single) is acceptable unless callbacks areinvolved. You should also have some idea how to assess the appropriatethrottling behaviors for your Web services exposed over HTTP: for concurrentrequests, by assessing expected load across all services; for concurrentsessions, based on use of reliable or secure sessions; and for concurrentinstances, based on the setting for concurrent requests. In the rare case youemploy application sessions for services, you must also consider resourceallocation for those resources. In addition, you should be mindful ofappropriate configurations for downstream services invoked by your Webservices. NOTE: For examplesof concurrency mode and throttling configurations discussed in this article, seesample code for Chapter 5 of my book, Learning WCF (available at http://www.thatindigogirl.com). Michele LerouxBustamante is Chief Architect of IDesign Inc., Microsoft Regional Directorfor San Diego, Microsoft MVP for Connected Systems, and a BEA TechnicalDirector. At IDesign Michele provides training, mentoring, and high-endarchitecture consulting services focusing on Web services, scalable and securearchitecture design for .NET, federated security scenarios, Web services,interoperability, and globalization architecture. She is a member of theInternational .NET Speakers Association (INETA), a frequent conferencepresenter, conference chair for SD West, and is frequently published in severalmajor technology journals. Michele is also on the board of directors for IASA(International Association of Software Architects), and a Program Advisor toUCSD Extension. Her latest book is Learning WCF(O Reilly, 2007); see her book blog at http://www.thatindigogirl.com.Reach her at mailto:[email protected] orvisit http://www.idesign.net and her mainblog at http://www.dasblonde.net.
Read more about:
MicrosoftAbout the Author
You May Also Like