Inside the Windows NT Scheduler, Part 2

Investigate how NT's multiprocessor scheduling algorithms work, why Microsoft chooses to employ these scheduling strategies, and how these choices can affect NT's scalability.

Mark Russinovich

August 1, 1997

12 Min Read

Multiprocessor scheduling decisionscan affect NT's scalability

In my July column, "Inside the Windows NT Scheduler, Part 1,"I described the basics of threads, priorities, and uniprocessor scheduling onWindows NT 4.0. I presented definitions for threads and processes, and providedan overview of NT's thread priority scheme. This month, I'll finish myexamination of NT scheduling by presenting the algorithms NT employs onmultiprocessors.

The priority scheme and scheduling basics that I described last month applyto multiprocessor systems as well. However, in most cases on multiprocessorsystems, the scheduler can choose among several CPUs to schedule a given thread.Deciding which CPU to pick so that NT uses the processors to their fullpotential is a complex problem, as you'll see.

In writing this article, I assumed that you are up to speed with theterminology I introduced last month. If you are not familiar with basicscheduling terms and concepts (e.g., you don't know what the Dispatcher ReadyList is), I suggest you read last month's column before digging into this one.

NT and Multiprocessing
From the outset, Microsoft designed NT with multiprocessing in mind. Thetype of multiprocessing that NT supports is symmetric multiprocessing(SMP). In an SMP system, all the CPUs are identical, the operating system (NT)can run on any of the CPUs, and all the CPUs have equal access to main memoryand peripheral devices. The opposite of an SMP system is a system in which theCPUs are different, the CPUs have private memory or devices, or the CPUs havepriorities such that the operating system runs on only certain CPUs. Figures 1aand 1b (page 178) contrast an SMP system with an example of a non-SMP (an asymmetric multiprocessing--ASMP) system.

Figure1A depicts an SMP system. The CPUs are identical, all the CPUs have private Level 1 (L1) and Level 2 (L2) caches (the L1 cache usually resides on the CPU chip), and all the CPUs have equal access to main memory and the I/O subsystem. The NT kernel can run on any of the CPUs; it can even run simultaneously on several CPUs. Figure1B presents an example ASMP system with one master CPU. The master CPU, which runs the operating system and is the only CPU that can access the I/O subsystem, farms out work to the slave CPUs.

Microsoft designed NT's internal data structures and code to support up to32 processors. However, NT Workstation enables a maximum of 2 processors, and NTServer enables a maximum of 4 processors, regardless of how many CPUs the systemcontains. Microsoft imposes these limitations with Registry values, which the NTkernel queries during processor initialization to determine the upper limit ofCPUs to conFigure. This process involves two values that cross-reference eachother, and the system monitors one value to prevent tampering. If a system hasmore than 4 processors, the vendor must license the OEM version of NT, which isnothing more than NT with modified setup files that initialize the Registry withvalues appropriate to the system during NT's installation. As NT enablesprocessors, it assigns them numbers (starting with 0) to identify theminternally.

To prevent corruption that might occur if multiple instances of the kernelmodified the same data simultaneously, the NT kernel protects its shared datastructures with locks. Typically, only one kernel instance (CPU) at a time canwrite to, or read from, core data structures. Before accessing a data structure,a CPU must acquire a lock that protects the data structure; after completing theaccess, the CPU must release the lock. If the lock is not available when the CPUrequests it, the CPU waits.

An example of a core data structure that NT protects with a lock is theDispatcher Ready List, which I described last month. Locking ensures that eventhough more than one processor might require a scheduling operation, thescheduling code will never be running on more than one CPU at a time. Otherprocessors that require scheduling must wait until they can acquire the lockprotecting the Dispatcher Ready List before executing the scheduling algorithms.

Although SMP systems contain more than one processor, the situations on SMPsystems that require scheduling operations are the same as the situations thatrequire scheduling operations on uniprocessor systems. You'll recall from lastmonth that several situations invoke the scheduler. If a thread's turn on theCPU expires, if a thread blocks (i.e., waits for an event to occur), or if athread yields its turn on a processor because the thread has terminated, thescheduler uses the FindReadyThread routine to find a new thread to execute onthe CPU.

A thread that becomes ready to execute also invokes the scheduler. Oneexample of a ready-to-execute thread is a thread that wakes up after waiting foran event to occur. Another example is a thread that NT pulls off a CPU becausethe thread's turn has ended--the thread is still eligible to execute, so thescheduler might move it to another CPU. The algorithm the scheduler executes inready-to-execute situations is ReadyThread.

The multiprocessor FindReadyThread and ReadyThread functions have the samenames as the functions that perform these tasks in the uniprocessor case. Tomake NT run as efficiently as possible on uniprocessors, Microsoft builds aspecial version of NT with most of the multiprocessor support code removed. Onesource code file contains both the uniprocessor and multiprocessorimplementations of FindReadyThread and ReadyThread; the source code conditionsthe cores of the routines on the version of NT that's being compiled.

Affinity and Ideal Processors
Before launching into the details of FindReadyThread and ReadyThread, I wantto discuss a few scheduling goals of the multiprocessor NT scheduler. Thescheduler will always try to schedule a thread using soft affinity; thatis, the scheduler attempts to assign the thread to the processor it lastexecuted on.

NT implements soft affinity to take advantage of the possibility that someof the thread's data from the last time it ran on a processor is still in thatprocessor's cache. In most cases, the next thread's data fills the cache, andthe optimization isn't effective. When a new thread with higher priority than arunning thread becomes ready to execute, the scheduler makes a decision based ona trade-off. The scheduler weighs the potential benefit of scheduling the newthread on the processor against the overhead that the pulled thread incurs whenit moves to a new processor and loses its cached data.

A thread can also have hard affinity, which the applicationdesigner typically defines. (You can also use system utilities to set the hardaffinity of all a process' threads.) The hard affinity of a thread isessentially a list of processors that the thread can execute on--the schedulerwill never schedule a thread on a nonlisted processor.

Why would anyone want to use hard affinity? In some cases, you can improveoverall performance if you distribute the threads of different applications (or even different threads of the same application) among differentprocessors. For example, you can assign threads from Internet Information Server(IIS) to one processor and give SQL Server threads to all other processors. Thisassignment prevents IIS threads from competing for processors with the SQLServer threads, and the number of processors you assign to each application canreflect the relative priority of these applications with respect to the goals ofthe system you're managing. Divvying up processors using hard affinity is knownas hard partitioning.

Because hard affinity is an all-or-nothing proposition, it can negativelyaffect performance in situations where the processors assigned to a particularapplication are busy and processors not in the hard-affinity list of theapplication's threads sit idle. NT 4.0 introduces a compromise: A programmer canassign an ideal processor to a thread.

The scheduler treats a thread's ideal processor much as it uses softaffinity: The scheduler tries to schedule a thread on its ideal CPU, but if thatCPU is busy with a higher-priority thread, the scheduler looks at otherprocessors in the thread's hard-affinity list. By using ideal processorsettings, an application designer gives the scheduler hints about where threadsneed to run. Thus, the designer soft partitions the application.

Picking the Next Thread to Execute
Now that we're past some terminology, let's look at the FindReadyThreadalgorithm that NT uses to choose a new thread to execute on a particular CPU.FindReadyThread always executes on the CPU that's searching for the next threadto run. After acquiring the Dispatcher Ready List lock, FindReadyThread scansthe Dispatcher Ready List for the highest-priority nonempty queue.FindReadyThread marks the first thread in the queue that has hard affinity forthe CPU as the primary candidate for execution.

If the primary candidate has soft affinity for the CPU, if the CPU is theprimary candidate's ideal processor, or if more than 20 milliseconds (on x86systems) have elapsed since the primary candidate last executed, FindReadyThreadchooses the primary candidate as the next thread to execute on the CPU. If theprimary candidate doesn't satisfy any of these conditions, FindReadyThread looksdown the queue for the first thread that does. If the algorithm finds anappropriate thread, the scheduler assigns that thread instead of the primarycandidate; otherwise, the scheduler assigns the primary candidate to execute onthe CPU. After the scheduler finishes its manipulation of the Dispatcher ReadyList, it releases the list's lock.

As an example of FindReadyThread at work, consider the Dispatcher ReadyList shown in Figure2. The scheduler must find a thread to run on CPU 1. Thehighest-priority queue with threads in it is the priority 10 queue. Thread 1 isthe first thread in the queue, and it has hard affinity for CPU 1; thus,FindReadyThread marks Thread 1 as the primary candidate. However, Thread 1 doesnot satisfy any other condition to make it the immediate choice: Thread 1 didn'tlast run on CPU 1, its ideal processor is not CPU 1 (it doesn't even have anideal processor), and it last ran 10 milliseconds (ms) ago. Therefore,FindReadyThread proceeds to scan the rest of the queue and comes across Thread2, which also has hard affinity for CPU 1. Thread 2 satisfies the condition thatit last ran on CPU 1, so Thread 2 is the scheduler's choice. Thus, thescheduler's soft-affinity goal causes it to favor Thread 2 over Thread 1.

Making a Thread Ready to Execute
When a thread becomes ready to execute, the scheduler must determine whetherto place the thread in the Dispatcher Ready List or schedule it for immediateexecution. On a uniprocessor system, this choice is straightforward: Given anexecuting thread and a thread to be scheduled, the scheduler sends the threadwith the higher priority to the CPU and places the other thread in theDispatcher Ready List.

On an SMP system, the decision is more complicated because the schedulermust consider multiple CPUs (unless the thread's hard affinity stipulates onlyone processor). ReadyThread, the algorithm responsible for this complexevaluation, uses soft affinity as its primary guide. After acquiring theDispatcher Ready List lock, ReadyThread looks to see whether any idle processors(i.e., processors not executing instructions from an active thread) appear inthe thread's hard-affinity list. If so, the scheduler's work is done; it signalsthe idle processor to execute the thread. (The scheduler picks the CPU it'srunning on if that CPU is a candidate; a CPU can be idle even though it'sexecuting the scheduler because the scheduler executes in a nonthread-specificcontext).

If no processors are idle or the thread cannot execute on any thread,Ready-Thread looks at one other processor. If the thread has an ideal processor,the scheduler considers that CPU; otherwise, the scheduler examines the lastprocessor that the thread executed on. If the thread has a priority that equalsor exceeds the priority of the thread running on the CPU in question, thescheduler signals the CPU to stop executing its current thread and execute thenew thread. In the case where the thread's priority is lower than the priorityof the running thread, ReadyThread places the thread in the appropriate priorityqueue of the Dispatcher Ready List and releases the list's lock.

Unusual Consequences
If no processors are idle and the ready-to-execute thread has a lowerpriority than the thread running on the ready-to-execute thread's ideal orsoft-affinity processor, ReadyThread can produce some unusual consequencesbecause it doesn't evaluate other processors in the system. Consider thescenario in Figure3. A priority 9 thread (Thread 0) has just becomeready to run, and even though it has a higher priority than the threads runningon three of the four CPUs in the system, the scheduler places Thread 0 in theDispatcher Ready List. Why? Thread 0's priority is lower than the priority ofthe thread running on the CPU that Thread 0 last executed on (Thread 1 on CPU 0has priority 10). Thus, until FindReadyThread picks Thread 0 from the DispatcherReady List, the system's second highest-priority thread will wait for aprocessor.

ReadyThread can also potentially lead to needless movement (migration) ofthreads from one processor to another. Consider the variation ofFigure3'sscenario that Figure4 depicts. In this example, Thread 1 executing on CPU 0 haspriority 8 instead of 10. ReadyThread will pull Thread 1 off CPU 0 in favor ofThread 0 because Thread 0 has a higher priority (9). Because Thread 1 is stillready to execute, NT calls ReadyThread for Thread 1. ReadyThread sees thatThread 1 has a lower priority than the thread executing on Thread 1's lastprocessor (CPU 0); therefore, ReadyThread places Thread 1 in the DispatcherReady List.

If the next event that triggers FindReadyThread is the expiration of Thread4's quantum, FindReadyThread pulls Thread 1 off the Dispatcher Ready List toexecute on CPU 3 and puts Thread 4 on the list. At this point, Thread 1 hasmigrated from CPU 0 to CPU 3. Thread 4 might then migrate to CPU 1 or CPU 2 whenone of them next invokes FindReadyThread.

NT could avoid unnecessary thread shuffling if ReadyThread considered onlythe system's lowest-priority executing thread in its scheduling decision, ratherthan using a soft-affinity strategy. Compare the thread movements diagrammed inFigure5a andFigure5b, in which threads with priorities 9, 8, and 7 arerunning on three CPUs. In the worst case for the soft-affinity strategy, apriority 10 thread that becomes ready-to-execute could set off all the shufflingshown in Figure5a. Alternatively, checking only the lowest-priority executingthread causes just the few movements shown in Figure5b.

The lowest-priority executing thread-scheduling strategy is simpler thanthe soft affinity-based ReadyThread algorithm and can eliminate many unnecessarythread migrations. The Solaris operating system uses this simpler algorithm, butMicrosoft defends its use of soft affinity as a primary scheduling parameter bystating that SQL Server achieves higher throughput with this implementation.

Scalability Ramification
Microsoft bases NT's scheduling algorithms for SMP systems on the sameprinciples it incorporates for NT's uniprocessor algorithms. However, efficiently using multiple processorscomplicates the scheduling problem.

Having more than one processor that the NT scheduler can assign a thread toleads to some interesting trade-offs in the ReadyThread algorithm--trade-offsthat could potentially hinder NT's ability to effectively use extra processorsthat you add to a system (scalability). Because Microsoft has apparently decidedthat it will measure the scalability of NT by the performance of SQL Server,Microsoft has tuned NT's scheduling algorithms for this database application.

About the Author

Mark Russinovich

See more from Mark Russinovich

Related Topics

Recent in Cloud

Related Topics

Recent in OS

Related Topics

Recent in IT Mgmt

Related Topics

Recent in Career

Related Topics

Recent in Storage

Related Topics

Recent in Security

Related Topics

Recent in Dev

Related Topics

Recent in DX

Related Topics

Recent in Infrastructure

Related Topics

Inside the Windows NT Scheduler, Part 2

About the Author

Editor's Choice

Featured Technical Explainers

Recent What Is

Related Topics

Recent in Cloud

Related Topics

Recent in OS

Related Topics

Recent in IT Mgmt

Related Topics

Recent in Career

Related Topics

Recent in Storage

Related Topics

Recent in Security

Related Topics

Recent in Dev

Related Topics

Recent in DX

Related Topics

Recent in Infrastructure

Related Topics

<span class="ArticleBase-LargeTitle">Inside the Windows NT Scheduler, Part 2</span>Inside the Windows NT Scheduler, Part 2

About the Author

Editor's Choice

Featured Technical Explainers

Recent What Is

Inside the Windows NT Scheduler, Part 2