Inside Win2K Reliability Enhancements, Part 3
Learn about two features that help eliminate device-driver errors.
September 14, 1999
Learn about two features that help eliminate device-driver errors
In Part 1 of this series about Windows 2000 (Win2K) reliability enhancements, I described new Win2K system recovery mechanisms; in Part 2, I presented measures that Win2K takes to keep a system updated with Microsoft-approved system files. This month, I describe two features—write-protected system code and the Driver Verifier—that Microsoft has added to Win2K to help identify poorly written device drivers and help developers avoid common device-driver programming errors during the development process.
Write-Protected System Code
Applications, the Win2K kernel, and device drivers all work with pointers. A pointer is a reference to a location in memory that the system allocates to a program or driver. One type of bug in programs or drivers can result in pointer corruption, in which the program or driver assigns a pointer a meaningless value instead of a valid memory address. Two other bugs that are closely related to each other are buffer overrun and buffer underrun. These bugs result when a program or driver thinks it's addressing a buffer it allocated, but in fact the pointer references memory just past the end of the buffer (overrun) or just short of the start of the buffer (underrun).
If a program or device driver with an invalid pointer writes a value to the location in memory that the pointer references, the program or driver might overwrite part of its code, part of a different buffer than the intended buffer, or unallocated memory. In the latter case, Win2K detects the error and terminates the application. If the culprit is a device driver, Win2K halts the system and displays a blue screen of death to prevent the misbehaving driver from inadvertently corrupting critical files such as the Registry or corrupting files on disk. If the memory location that the invalid pointer writes to is in a different buffer or portion of the program and the bug is in an application, Win2K will eventually detect the problem or a side effect of the problem and terminate the application. Although Win2K's memory protection prevents one application from writing to the memory of another application, in kernel mode, a device driver can write to any other device driver's memory or to the kernel's memory.
In previous versions of Windows NT, including NT 4.0 and NT 3.51, a bug in a device driver that corrupts the kernel or another driver's code or memory is difficult to identify. The system almost always eventually detects the problem when a corrupted driver or kernel performs illegal operations, but tracing the problem to its root is virtually impossible. Win2K enforces write-protected system code to help systems administrators and device-driver developers identify pointer corruption when it occurs. If a corrupt pointer references the executable code of a device driver or the kernel, the Win2K Memory Manager is aware of the overwrite operation when it occurs and therefore knows which driver is responsible for the operation. This capability limits the number of cases in which the system doesn't immediately detect corrupt pointers to cases in which the pointer references memory buffers of a different device driver or the kernel. When the system identifies a misbehaving device driver, a systems administrator can update the driver or remove it from the system.
Win2K implements system-code protection in the routine that loads the kernel image and device-driver files into memory. The function MiLoadSystemImage identifies executable code in a loading file and uses the hardware's memory management features to designate those sections of memory as write-protected. In certain situations, developers might need to disable Win2K's protection of system code areas. To do so, a developer sets the HKEY_LOCAL_MACHINE SYSTEMCurrentControlSet ControlMemory ManagerEnforceWriteProtection Registry value to 0.
The Driver Verifier
A new feature in Win2K, the Driver Verifier (or Verifier), is probably the most powerful tool in Win2K's reliability arsenal. Much as write-protected system memory does, the Verifier consists of code in the Win2K kernel that, when applied to a device driver, detects common device-driver bugs as they occur. For example, although write-protected system memory lets Win2K catch memory modifications in device drivers or kernel code that corrupt pointers reference, the Verifier detects many types of modifications that corrupt pointers make that reference data buffers. If a device driver with a bad pointer causes a buffer overrun that corrupts another driver's memory in NT 4.0, the error might go undetected indefinitely, which makes isolating the buggy device driver impossible. But with the Verifier, Win2K usually can detect such a bug immediately. The Verifier also detects other common device-driver programming mistakes.
You use the Verifier GUI (%systemroot%system32verifier.exe) to configure the Verifier and view statistics about operations that the Verifier code executing in the kernel makes. The Verifier GUI property sheet includes several tabbed pages. You use the Modify Settings tab to specify which device drivers you want to verify and what types of verification the kernel performs. Registry settings that the Verifier modifies when you select options on the Modify Settings tab reside under HKEY_LOCAL_MACHINESYSTEMCurrentControlSet ControlMemory Management and include the REG_DWORD value VerifyDriverLevel and the string value VerifyDrivers. The Win2K kernel interprets VerifyDriverLevel as a bit mask, and each position in the mask represents one of the verification types listed on the right side of the Modify Settings page. When you select specific drivers to validate, the Verifier stores their names in VerifyDrivers. An exception occurs when you tell the Verifier to verify all drivers; the Verifier will then set VerifyDrivers to the asterisk (*) wildcard character.
After you've entered or changed Verifier settings, you must reboot the system. Early in the boot process, the Win2K Memory Manager reads the Verifier Registry values to determine which drivers to verify and which Verifier options you enabled. Subsequently, if you've selected at least one driver for verification, the kernel checks the name of every device driver it loads into memory against the list of drivers that you've selected for verification. For every device driver that appears in both places, the kernel invokes the MiApplyDriverVerifier function, which replaces the driver's references to any of approximately 40 kernel and Win32K.sys (the kernel-mode Win32 subsystem) functions with references to Verifier-equivalent versions of those functions.
The kernel functions to which MiApplyDriverVerifier redirects device drivers are associated with the kinds of checks that the Verifier performs. For example, the Verifier intercepts all of a verified device driver's buffer allocation and deallocation functions. If a device driver under verification usually uses the kernel function ExAllocatePool to allocate memory, the driver uses VerifierAllocatePool instead.
One of the most common device-driver bugs occurs when a driver accesses pageable data or code when the processor on which the device driver is executing is at an elevated interrupt request level (IRQL). When code or data is pageable, Win2K can send the data out of the computer's memory to the paging file. Win2K uses IRQLs to mask software or hardware interrupts. Because of the way that the Win2K Memory Manager relies on hardware interrupts to service page faults (i.e., the Memory Manager must have access to data in the paging file that it must read from disk into memory), the Memory Manager can't service a page fault when the IRQL rises. However, device drivers can access data that is locked into memory when the IRQL rises; Win2K calls this type of data non-paged.
The system often doesn't detect instances of a device driver accessing pageable data when the processor is executing at a high IRQL level because the pageable data the driver accesses is in the computer's memory instead of in the paging file. Testing device drivers for this kind of bug is difficult, but such bugs usually show up on user installations as a blue screen of death with the stop code IRQL_NOT_LESS_OR_EQUAL (i.e., the IRQL was not less than or equal to the level required for the operation attempted—in this case, accessing pageable memory).
If you select Force IRQL Checking on the Modify Settings tab's Verification Type list, the Verifier tries to coax device drivers into violating the IRQL rule in a way that Win2K will immediately detect. The Verifier forces all kernel-mode pageable data out to the paging file whenever a device driver under verification raises the IRQL. The function that does the forcing is MmTrimAllSystemPageableMemory. Thus, whenever a device driver under verification accesses pageable memory when the IRQL is elevated, the system instantly detects the violation, and the resulting blue screen of death identifies the buggy driver.
When you check I/O Verification on the Verification Type list, the Verifier redirects a handful of I/O Manager-related functions that a verified driver invokes to the functions' Verifier versions. The I/O Manager Verifier functions ensure that I/O Request Packets (IRPs) that a device driver under verification sends to other drivers are actual IRPs. The functions also make sure that any device drivers that receive IRPs from a device driver under verification don't corrupt the processor's IRQL, another relatively common device-driver programming error.
Enabling Allocation Fault Injection causes the Verifier to randomly fail memory allocations that verified device drivers perform. In the past, developers wrote many device drivers under the assumption that kernel memory would always be available and that if memory ran out, the device driver didn't have to worry about it because the system would crash anyway. With Win2K, Microsoft wants to ensure that the system survives temporary low-memory conditions and has therefore required that device drivers properly handle allocation failures that signal when kernel memory is exhausted. Thus, beginning 7 minutes after the system boots—which is enough time to get past the critical initialization period in which a low-memory condition might prevent a device driver from loading—the Verifier starts randomly failing allocation calls for device drivers it is verifying. If a hardware vendor doesn't program a driver to gracefully handle allocation failures, the driver eventually performs an invalid memory reference and Win2K can identify that faulty driver.
The Pool Tracking option in the Verification Type box is not new to Win2K. NT 4.0 and NT 3.51 also have pool tracking, but the only way to enable it in either OS is to use the gflags.exe program from the Windows NT Server 4.0 Resource Kit and the Windows NT Server 3.51 Resource Kit. In Win2K, when device drivers allocate memory, they can specify an optional four-letter tag in their allocation requests. When you disable pool tracking, Win2K ignores the tag; however, when you enable tracking, Win2K associates the tag with memory the driver allocates. Using PoolMon, a tool that ships with the Device Driver Kit (DDK), a developer can view how much memory Win2K assigned to each tag. Monitoring driver memory usage lets a developer detect memory leaks, an error that occurs when a driver fails to deallocate memory it no longer requires. The Verifier also shows general pool statistics on the Driver Verifier Manager's Pool Tracking tab.
Special Pool is the final verification option that the Verification Type box provides. When you enable Special Pool, the kernel allocates a region of kernel memory for the Verifier to use. The Verifier redirects memory allocation requests that drivers under verification make to the special pool area, rather than to the standard kernel-mode memory pools. Special Pool has several characteristics that let Win2K detect overrun and underrun errors immediately, and Special Pool throws in some additional checks that the Memory Manager doesn't usually make when a driver allocates or frees memory.
When a device driver allocates memory from Special Pool, the Verifier gives the driver an entire page of memory. The Verifier places the buffer that the device driver uses either at the end or the beginning of the page and fills the rest of the page with a random signature. Furthermore, the Verifier marks as invalid memory the one page preceding and the one page following the page it allocates for the driver. Figure 1, page 58, shows an example of the Special Pool buffer that the Verifier allocates to a device driver when the Verifier checks for overrun errors. If a device driver attempts to read or write past the end of the buffer, the driver will access an invalid page and the Win2K Memory Manager will cause a blue screen of death. The overrun-detection configuration includes some measure of underrun detection as well: When the driver frees its buffer to return the memory to the Verifier, the Verifier ensures that the signature preceding the buffer doesn't change. (When Win2K enforces underrun detection, the Verifier allocates the driver's buffer at the beginning of the page, rather than at the end.) If the signature is modified, the possibility exists that the device driver has underrun the buffer and written to memory outside the buffer. The Verifier GUI doesn't let you toggle underrun detection, so you must set the Registry value HKEY_LOCAL_MACHINESYSTEM CurrentControlSetControl Memory ManagementPoolTagOverruns to 1 if you want underrun checking. Special Pool allocations also check to ensure that the processor IRQL at the time of an allocation and deallocation is legal. This check catches an error that some device drivers make that allocates pageable memory from an IRQL that is too high.
You can configure Special Pool independently of the Verification Type box. Win2K interprets the Registry value HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlMemory ManagementPoolTag as a REG_DWORD value that represents the allocation tag that the system uses for Special Pool. Thus, even if the Verifier isn't verifying a device driver, if the tag the driver associates with the memory it allocates matches PoolTag's value, the kernel allocates the driver's memory from Special Pool. If you set the value of PoolTag to 0x0000002A, or the wildcard (*), then all memory that drivers allocate is from Special Pool.
Microsoft uses the Verifier to check all device drivers that vendors submit for Hardware Compatibility List (HCL) testing. Doing so ensures that the drivers on the HCL are compatible with Win2K and free from common driver errors. Microsoft is also considering having Win2K automatically enable the Verifier to verify any device driver that the OS suspects of causing a blue screen of death in a production environment. When a blue screen of death occurs, the kernel would mark the driver that appears to have caused the blue screen for verification at the next reboot. Then, if the driver performs an illegal operation that the Verifier catches, an administrator can send a crash dump that pinpoints a bug in the driver to the vendor or to Microsoft.
Page Heap
Microsoft has also added user-mode applications to Win2K that are the equivalent of the kernel's Special Pool feature. Microsoft introduced Page Heap in NT 4.0 Service Pack 4 (SP4), but the company is publicizing Page Heap as a new Win2K feature. Page Heap support consists of the Gflags tool in the Windows 2000 Resource Kit. Gflags lets developers and administrators configure an application to run with Page Heap. (The memory that user-mode applications allocate from is a heap, whereas the memory that device drivers and the kernel allocate from is a pool.) Each application has a heap. As Special Pool does, Page Heap will allocate memory for applications so that the system immediately detects memory overrun and underrun errors. The Page Heap allocation scheme works exactly as Special Pool in kernel mode works: The pages preceding and following the page that contains an application's buffer are marked invalid, and the application's buffer occurs at the end (for overrun detection) or at the beginning (for underrun protection) of the valid page.
A More Reliable Windows NT
The combination of tools that Microsoft provides administrators and developers for preventing, isolating, and recovering from driver and application errors is a compelling reason to move from NT 4.0 to Win2K. The Driver Verifier in particular will have a profound effect on both the perceived and actual reliability of Win2K and will ensure a high level of software quality for code executing in kernel mode, an area in which software quality is of utmost importance.
About the Author
You May Also Like