Inside Memory Management, Part 1

Learn how NT implements hardware-supported virtual memory to allocate addresses and uses memory sharing and copy-on-write to manage memory.

Mark Russinovich

July 31, 1998

16 Min Read
ITPro Today logo in a gray background | ITPro Today

Understand NT's basic memory processes

Computers never seem to have enough memory, no matter how much memory isinstalled. One of the most complex and difficult tasks Windows NT faces ismanaging the limited physical memory present on a computer. This challenge isheightened by the fact that NT must divide physical memory among many processesthat might be running simultaneously, giving each process an appropriate memoryshare. Further, NT must be able to adjust its behavior within a wide range ofmemory sizes, from as little as 16MB to as much as 1GB or more.

This month I'll introduce the concept of virtual memory, which is based onhardware-supported memory paging. This column will lay a foundation forunderstanding how NT defines process address spaces. I'll discuss how NTallocates virtual-memory addresses and the internal data structures that recordallocation. I'll also describe two powerful features of NT memory management:memory sharing and copy-on-write. Next month, I'll describe more internal datastructures, how NT implements shared memory, and working set tuning.

Virtual Memory
As do almost all modern operating systems (OSs), including Windows 3.x,Windows 95, Win98, and UNIX, NT relies on hardware support to provide processeswith the illusion that a computer has more memory than it actually has. Themechanism that implements this illusion is known as paged virtual memory.The Memory Manager (or Virtual Memory Manager) executive subsystem is the NTcomponent responsible for NT's paged virtual memory and for exporting functionsthat other subsystems and applications can use to interact with paged virtualmemory. Processes access memory through their virtual memory address space. InNT, the size of a process' address space is defined by the number of bytes thatcan be described in 32 bits, which is 232, or 4GB (64-bit NT, whichwill be available on Alpha and Intel Merced--Deschutes--processors, will have 264 bytes of addressibility). Thus, barring other resource limitations, a process can theoretically allocate 4GB of code and data. However, most computers todayhave less than 128MB of physical memory. The 4GB address space of a process istherefore known as virtual memory, because it doesn't directlycorrespond to physical memory.

Before I describe how a computer's OS and hardware collude to trickprocesses into thinking they have 4GB of memory, I'll review memory addresstranslation. The memory management unit (MMU) on Alpha and x86 processorsmanages physical and virtual memory in blocks called pages. On x86 processors,the size of a page is 4KB, whereas Alpha processors maintain 8KB pages. Acomputer's physical memory pages are known as page frames, which theprocessor numbers consecutively with page frame numbers (PFNs).

When a process references its virtual memory, it does so with a 32-bitpointer, and the MMU's job is to determine the location in physical memory towhich the virtual address maps. The MMU determines this location by dividing theaddress into three separate indexes, as Figure 1, page 68, shows: page directoryindex, page table index, and page byte offset. The MMU uses each index in athree-step address resolution process.

The first step begins with the MMU locating a table known as the pagedirectory. The MMU locates the page directory by reading the contents of aspecial hardware processor register. On the x86 this register is the ControlRegister 3 (CR3), and on the Alpha the register is the Page Directory Register(PDR). Each process has its own private page directory, and the address of thatdirectory is stored in the process' control block data structure. Whenever thescheduler switches from one process to another, NT updates the page directoryregister with the value stored in the control block of the process that takesover the CPU. The first step in the resolution process continues with the MMUusing the page directory index from the virtual address to index to the pagedirectory table. The MMU retrieves from the page directory the page frame numberof the location of another table, the page table. Entries in a page directory table are called page directory entries (PDEs) and are 32 bits insize.

In the second step of the resolution process, the MMU uses the page tableindex from the virtual address on the page table the MMU located. The entry theMMU finds at the page table index identifies a page frame in physical memory.Finally, in the third step, the MMU uses the page byte offset as an index intothe physical page and isolates the data that the process wants to reference.Entries in a page table are called page table entries (PTEs). Similar to a PDE,a PTE is 32 bits wide.

You might wonder why the MMU goes through this three-step gyration. It doesso to save memory. Consider a one-step translation in which a virtual address isdivided into two components, one locating a PTE in a page table and the otherserving as a page offset. To map a 4GB address space, the page table would haveto consist of 1,048,576 entries. Four bytes (32 bits equals four 8-bit bytes)per entry would mean that the page table would require 4MB of physical memory tomap each process' address space. With a two-level scheme such as the one the x86and Alpha use, only a process' page directory must be fully defined--page tablesare defined only as necessary. Thus, if the majority of a process' 4GB addressspace is unallocated, a significant saving in memory results because page tablesare not then allocated to define the unused spaces.

Nevertheless, the three-step translation process would cause a system'sperformance to be unbearably poor if the process occurred on every memoryaccess. Therefore, x86 and Alpha processors come with an address translationcache such as the one Figure 2 shows, which stores the most recent virtual pageto physical page translations. When a process makes a memory reference, the MMUtakes the virtual page number and simultaneously compares it with the virtualpage number of every translation pair stored in the cache (this type ofsimultaneous-compare memory is known as associative memory). If there'sa match, the MMU can bypass the page directory and page table lookups because it has already obtained the page frame number from thesimultaneous compare. Address translation caches are known as TranslationLook-aside Buffers (TLBs) or Translation Buffers (TBs). One of the costsassociated with the scheduler switching from one thread to another is that, if anewly scheduled thread is from a different process, the TLB must clear themappings that belong to the old process. Then, the three-step translation isrequired to fill the TLB with mapping pairs for the new process.

Paging
What I've described so far is the address translation that occurs when aprocess references a valid virtual mem-ory address. A process can also makeseveral types of invalid memory references. I'll review error cases first, thenI'll discuss situations in which NT considers an invalid memory reference legaland correct--this type of reference is necessary to implement true virtualmemory.

Not many processes today require 4GB of address space. Therefore, theaddress map of most processes is almost entirely empty or undefined. When aprocess references undefined parts of its virtual memory map, the MMU detectsthe undefined space by finding, in the first step of translation, a PDE markedinvalid in the page directory, or by finding in the second step of translation aPTE marked invalid in a page table. PDEs and PTEs contain enough space inaddition to the indexes they store to keep several bits that serve asbookkeeping information. One bit of a PDE and PTE is the valid bit, which is setonly when the translation through the PDE or PTE is configured as legal. If thisbit is not set (i.e., it's off), the MMU will stop translation and raise aprocessor exception called a page fault.

Besides actively playing a role in defining the contents of PDEs and PTEsto define address spaces, NT must respond to page faults and react to themappropriately. When the MMU invokes NT's page-fault handling code, the MemoryManager must check to see if the reference that raised the exception is to anundefined address. If the reference is to an undefined address, the referenceraises an access violation (which usually results in the termination of theprocess) if the processor was executing in user mode, or the blue screen if itwas executing in kernel mode. (See my column "Inside the Blue Screen,"December 1997, for more information about the difference between user mode andkernel mode).

Because the MMU raises a page fault when a process references an invalidPDE or PTE, paged virtual memory can rely on a nifty trick. To support theillusion that an application process has access to more data and code thanphysical memory can hold, NT's Memory Manager will move parts of theapplication to a file on disk called a paging file. NT marks as invalid the PTEsthat would otherwise map the pages in the process' address space that correspondto the paged-out data. Thus, when a process tries to reference a paged-out partof its virtual memory, the MMU generates a page-fault. The Memory Manager'spage-fault handler then looks into its internal data structures and discoversthat the reference that triggered the page fault was not to an undefined addressbut rather to an address whose data is stored temporarily in the paging file.The Memory Manager then makes room in physical memory for the page from thepaging file that the process is requesting. This operation often means thatanother page of data from the current process or from another process is sent out to the paging file (a page-out operation). Once the Memory Manager creates space in physical memory for the requested page, the process reads the requested page from the paging file (a page-in operation).

After the page-in operation completes, NT updates the page table of theprocess that raised the page fault, and the page table points at the new pageframe. The processor instruction that caused the memory reference restarts, andon the second time through, the translation succeeds without a page fault andaccesses the requested physical data. For both page-in and page-out operations,the Memory Manager works with a disk driver to perform the I/O.

Thus using invalid bits to its advantage, the Memory Manager makes acomputer appear to have a total amount of memory that is equal to the size ofphysical memory plus the sizes of all the paging files. You can create up to 16paging files in NT, placing each on a separate logical drive.

Process Address Spaces
Now that you understand the mechanics of paging, let's talk about processaddress spaces and how the Memory Manager defines and keeps track of them. AsI've stated, each process has a 4GB virtual address space. This space is dividedinto two areas, in which the lower addresses map to data and code that areprivate to the process, and the upper addresses map to system data and code thatare common to all processes, as Figure 3 shows. When the scheduler changesexecution from one process to another, the page directory register in theprocess changes so that the process-private portion of the processor's mappingsis updated. The system mappings are kept global through common page tables thatevery process' page directories point to.

The permissions that apply to pages in each region also reflect the splitin mappings. The PTEs of x86 and Alpha processors contain a permissions bit.This bit specifies whether a page is accessible from a program executing in usermode. If a process refers to a page that is not accessible from user mode, theMMU generates a page fault, and the Memory Manager generates an access violationfor the process (which is caught by Dr. Watson). As the Memory Manager sets upaddress spaces, it marks all system PTEs as accessible only from kernel mode.Thus, the Executive subsystems, device drivers, hardware abstraction layer(HAL), and Kernel can reference system memory, but user applications cannottouch system memory. This restriction protects key OS data structures and makesthe system secure.

Figure 3 shows that on pre-Service Pack 3 (SP3) systems, the boundarybetween the lower address space (user mode) and the upper address space (kernelmode) is at 2GB. On SP3 and NT 4.0, Enterprise Edition, the boundary can bemoved with a switch on a boot.ini line (/3GB) so that user programs have accessto 3GB and system memory can access 1GB. This change enables memory-intensiveapplications such as database servers to directly address more memory, thusrelying less on shuffling data to and from disk in a manner similar to paging.

Memory allocation in NT is a two-step process--virtual memory addresses arereserved first, and committed second. The reservation process is simply a way NTtells the Memory Manager to reserve a block of virtual memory pages to satisfyother memory requests by the process. However, Memory Manager makes no changesto a process at the time of a reservation because the reservation does not useactual memory. When a process wants to use addresses it has reserved, it must commit them. Access to uncommitted reserved memory will typically result in an accessviolation. When a process wants to commit addresses, the Memory Manager ensuresthat the process has enough memory quota to do so. The Memory Manager alsochecks to see that there is enough commit memory (physical memory plus the sizeof all the paging files) for the commit request. There are many cases in whichan application will want to reserve a large block of its address space for aparticular purpose (keeping data in a contiguous block makes the data easy tomanage) but might not want to use all of the space. The application can specifyboth reservation and commit in a single request to the Memory Manager with aspecific API, and this is the way most applications allocate memory.

One example of the Memory Manager's use of the single-request functionalityis in the management of a thread's call stack. The Memory Manager reserves arange of memory (usually 1MB) in a process' address space for each threadcreated in a process. However, the Memory Manager actually commits only one pageof each stack. As a thread uses the stack and touches reserved pages, the MemoryManager commits the reserved pages, increasing the size of the stack.

When a process reserves memory, it must specify the amount of memory butcan also request a starting virtual address and specific protections to beplaced on the memory. NT uses bits in a PTE that the MMU defines to indicatewhether a page can be written to or read from, and whether code can be executedin the page. NT sets the addresses that the API parameters specify. If the PTEbits specify a page as read-only and a process attempts to write to the page,the MMU generates a page fault, and NT's page-fault handler will flag an accessviolation for the process. The system uses this protection to easily detecterrant programs that try to do things such as modify their own code image.

NT keeps track of the reserved or committed address ranges in a process'address space by using a tree data structure, such as the example Figure 4shows. Each node in the tree represents a range of pages that have identicalcharacteristics with respect to protection and commit state information. Thistree structure is a binary splay tree, which means that the depth of thetree (the maximum number of links from the root to any leaf, or bottom, node) iskept to a minimum. The tree's nodes are called Virtual Address Descriptors(VADs), and the process' process control block stores the root of the tree. Whena process makes a memory reservation or commit request, the Memory Manager canquickly determine where there are free spaces in the process address map, andwhether the request overlaps memory that is already reserved or committed.

File Mapping and Memory Sharing
A powerful feature of the NT Memory Manager is that it lets programs mapfiles into their virtual address space. This mapping is accomplished in twosteps. In the first step, the program creates an object, called a SectionObject, to describe a file that the Memory Manager can map. The Section Objectholds information about the name of a file, its size, and what portions of itare mapped. The Win32 function CreateFileMapping results in a Section Object'screation.

In the second step, the program maps all or part of the file into theprocess address space. The process invokes the Win32 function MapViewOfFile todo this mapping, specifying the start offset in the file to begin mapping andthe number of bytes to map. The process specifies the protection (e.g.,read-only, read/write) on the view at this time. After the view is mapped, theprocess can read to and write from the file simply by reading and writing datato the view's addresses in the process address map. The Memory Managertransparently works with the file systems to ensure that all processes accessingthe file see any updates the original process makes and that the updates areeventually flushed to disk.

The file-mapping capability makes file I/O extremely straightforward, andNT uses it extensively. When an application launches, the Process Manager maps aview of the application's image to the address space of the application'sprocess. The Process Manager transfers control to the entry point of the imagein the address space, and as the application executes, any pages referenced forthe first time generate page faults. The page faults result in the MemoryManager reading the applications pages from the file on disk.

NT uses a variation of file mapping to share memory. In this variation,MapViewOfFile takes a parameter that specifies memory mapping, rather than filemapping. The Memory Manager backs views of the section with the paging file whenmemory mapping is specified. The data in the view is no different from regularvirtual memory that a process reserves and commits. However, NT can givesections of the data names in the Object Manager namespace so that twoor more processes can open the same section, map views to their own addressspaces, and then communicate with one another through this shared memory.

Copy-on-Write
The Memory Manager implements an important optimization of memory sharingcalled copy-on-write. There are several common scenarios in which a processmight want to use the same data as another process but keep any modifications itmakes private to itself. For example, if two or more processes start the sameapplication and one process modifies the data in the image, other processesshould not see that modification. The obvious way to accomplish this kind ofprivate modification is to load multiple copies of the application into memory;however, this strategy wastes space. Instead, the Memory Manager marks thephysical pages containing the image read-only and notes in an internal datastructure (which I'll describe next month) that the page is a copy-on-writepage. When a process tries to modify the copy-on-write page, that action willgenerate a page fault (because the page is read-only). The Memory Manager willsee that the process referenced a copy-on-write page and will copy the contentsof the copy-on-write page to another page that is private to the process thatmade the reference. Then, when the faulting instruction restarts, the processcan modify its private copy of the page. Figure 5 demonstrates this procedure.In the figure, Process 1 and Process 2 share three copy-on-write pages. IfProcess 1 writes to one of the pages, it will get its own private copy of thepage, and Process 2 will retain the original of the page.

NT uses the copy-on-write functionality mostly for executable images. Whenprograms start, the Process Manager maps copy-on-write images. The POSIXsubsystem also uses copy-on-write for handling the POSIX fork operation. When aPOSIX process forks, the Process Manager creates a child process that willinherit the address space of the parent process. Instead of making a copy of allthe parent's memory, the Memory Manager just marks the address-space pages ofboth the parent and child processes as copy-on-write. When either process makesa modification to its memory, it receives a private copy of the page it wants tomodify, and the original of the page will then be private to the other process.

Stay Tuned
Next month, I'll dig deeper into the Memory Manager to describe the internaldata structures it uses to keep track of pages. I'll present the details of howthe Memory Manager implements shared memory, and I'll give you an overview ofthe way working sets--the amount of physical memory assigned to eachprocess--are tuned as the system runs.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like