The Blue Screen of Death

Don't let a kernel STOP error intimidate you. Read the blue screen's message and get back on track.

Mark T. Edmead

May 31, 1997

12 Min Read
ITPro Today logo in a gray background | ITPro Today

Find out what it has to tell you

You're working along just fine, and suddenly, your screen display changes from your niceuser interface to something that looks like Screen 1. You know what it is: a Windows NT kernel STOPerror, or the blue screen of death. So, what can you do? Often, the problem goes away when youreboot the system. But what if it doesn't? What does that screen mean? Is it safe to continue usingthe system? Let's look at what a kernel STOP error means, what can cause it, and most important,what information you can get from the blue screen.

What Happens at the Kernel Level?
First, let's review the basics of the NT architecture. The NT operating system has two layers:user mode and kernel mode. User mode is where the various subsystems--such as the Win32, POSIX, orOS/2 subsystem--reside. Components in this mode provide the environments in which all userapplications run. For instance, Win32 programs run on the Win32 subsystem.

As you see in Figure 1, the kernel mode sits between the user mode and the physical layer (thehardware) and prevents the user mode from directly accessing the hardware. The kernel mode also isthe home for the various NT executive services, such as the Object Manager, Security ReferenceMonitor, and Process Manager. Just above the physical device hardware lies the hardware abstractionlayer (HAL) and above that is the NT microkernel. The HAL is the portion of the kernel that iswritten in the specific platform assembly language. The microkernel is the heart of the OS thattakes care of all the NT internal OS operations.

An important component of executive services is the I/O Manager. Besides taking care of allinput and output for the operating system, the I/O Manager manages communications between driversand supports all file system drivers and hardware device drivers.

NT is a modular operating system; this fact means you can add DLLs or device drivers to addcapabilities to the system. You can, for instance, add fault tolerance to NT by adding devicedrivers. When a peripheral manufacturer develops a driver for NT, the driver is most likely a kernelmode driver: It resides in the kernel mode area and probably interfaces with Microsoft kerneldrivers. You can think of kernel drivers as the NT counterpart to Windows 3.1 or NT virtual devicedrivers (VxDs). Kernel drivers are the low-level mechanisms for talking to the hardware. So when thedriver does something it's not supposed to, the error occurs at the lowest level and directlyaffects the overall system and causes a kernel STOP error.

If an application operating in user mode does something to cause an error, NT halts the processand generates an Illegal Operation error. Because every Win32 application has its own virtualprotected space, this error condition doesn't affect any other Win32 programs running. If theapplication tries to directly access the hardware without going through the correct methods, NTnotices this and generates an exception error. A nice thing about NT is that it has good protectionsystems for erratic applications.

When an application faults, you can close the offending program and resume work. Kernel errorconditions, however, typically are not recoverable; you have to reboot the system. You can think ofthe kernel STOP as a built-in error-trapping mechanism. A kernel STOP error is NT's way of haltingfurther activity before the activity severely damages your system or corrupts data.

What Does This Weird Screen Mean?
OK, so what does this screen tell me? The kernel STOP may mean that a kernel driver--either asystem device driver or a third party driver--has illegally accessed the privileged kernel area. Orthe kernel STOP may mean that you have mixed SIMMs or added a bad network controller or SCSIcontroller. In these cases, you can fix the problem by removing the offending hardware device. Ifyou have not added any new hardware, you need to get more information from the blue screen. Let'slook at each portion of Screen 1. Fortunately, you don't need to understand everything on thescreen.

At the top of the display is a hexadecimal value followed by four hex numbers in parentheses.The first hex code is the kernel error code. With this error code, you can determine where the erroroccurred, but not which driver caused the error. Table 1 lists the various error conditions. In ourexample, the error condition is 0*0000000A, IRQL_NOT_LESS_OR_EQUAL. This code means that a processattempted to access pageable memory at a process internal request level (IRQL) that was too high.Microsoft Windows NT Server Resource Kit and Microsoft Windows NT Workstation Resource Kit havecomplete listings of STOP codes.

The values in the parentheses give more specific information about what the driver was doingwhen the error happened. The first value (00000000) points to the address that the driver referencedimproperly. The second value is the IRQL that was required to access the memory. The third valuespecifies whether the driver was doing a read or a write. The fourth value points to the instructionaddress that attempted the access. By looking at the STOP code and the third and fourth parameters,you can possibly determine what caused the error condition.

The information in the middle of the screen, called the DLL base (starting with 80100000),lists the drivers the system loaded and initialized successfully. The bottom of the screen, calledthe DLL load base, shows the drivers in the stack. The first driver in the list is the next one tobe pushed from the stack, or executed. In many cases, the first driver is the offending driver. Whenthe base address of the first driver is close to the fourth value at the top of the screen (theinstruction address that attempted access), you can hypothesize that the driver might have causedthe problem when it was initializing and being pushed off the stack. In Screen 1, the number in theDLL load base (000002fe) is very near to the fourth value (00000000) at the top of the screen.

Not all blue screens are easy to read. In this example, the problem driver might still be adriver listed in the middle of the screen, even though the screen shows that it initializedcorrectly, or the driver might not be on the screen at all (in the case of a bad controller card).Or something other than a driver might have caused the problem. When you can't easily find theproblem, you need to go to the next step: debugging.

How to Debug
Let's assume that you have determined that the cause of the kernel STOP is an installed devicedriver and not a hardware problem. Now what? Well, it depends. If you are not the developer of thedriver, you probably want to save the NT image information, and let someone else figure out whathappened. This approach is called noninteractive debugging. If you are the developer and you havethe source code, you can use the kernel debugger that NT provides to step through the driver code.This technique is interactive debugging.

Noninteractive debugging. NT gives you the option of saving the image of theoperating system (at the time of the kernel STOP error) to your hard disk. You can use thisinformation to determine the cause of the problem. To save the NT image to disk, go to the ControlPanel, System applet (Screen 2, shows the NT 3.51 setup, and Screen 3, the NT4.0 setup). You need to be an Administrator to access the options. You can write the event to thesystem log so you can view the error in the Event Viewer. This option is handy, because if you setyour system to reboot automatically after a kernel STOP error, the condition may go unnoticed. Youalso have the option to have the system send an administrative alert. This alert is useful, forexample, when the server has the kernel STOP error and you are working someplace where you can't seethe server screen.

The next option lets you write the memory dump file to %SystemRoot%MEMORY.DMP. Note that thesize of the image file is roughly the size of your physical RAM. Therefore, if you have 128MB ofRAM, your dump file will be 128MB! You can select the option to overwrite the existing file, if onealready exists. The last option is to set the system to automatically reboot. If you elect to savethe image file and to reboot, the process may take a while, depending on RAM size. I have seen thisprocess take more than 20 minutes, so be patient.

You might also want to have the computer send an administrative alert. An alert is useful whenthe system that has the problem is not near you, and you need to be informed when the error occurs.You can configure administrative alerts in the Control Panel, Server applet.

The Windows NT Server and NT Workstation CD-ROMs contain some tools to help you with thismemory file. dumpflop.exe writes the memory file to floppies (a 32MB memory file fits on about 10disks). Unfortunately, Microsoft does not accept the memory file on any other medium. Once you havecreated the dump file, you can make it available to a Microsoft Product Support Specialist either bysending the floppies to Microsoft or by preparing a Remote Access Service (RAS) connection forMicrosoft Product Support to dial in and view the file contents remotely. Or you can submit the fileto Microsoft over the Internet by connecting to ftp.microsoft.com and copying the file to/transfer/incoming/bussys/winnt.

You can use another utility, dumpchk.exe, to examine the integrity of the dump file and verifythat the system created the file correctly. With dumpchk, you can view basic information about thedump file, such as which NT version was running and the STOP error codes.

Another useful utility is dumpexam.exe, which converts the memory file into a readable textfile. You need three files to run dumpexam: dumpexam.exe, imagehlp.dll, and for the Intel platform,kdextx86.dll (the third file depends on the platform). The three files must be in the samedirectory. You can find them on the CD-ROM of the NT Server or the NT Workstation CD-ROM in thedirectory supportdebug, where platform is i386, alpha, mips, or ppc.

The noninteractive debugging method is ideal for users who don't want to debug the driver, butjust want to figure out which one is at fault. To run dumpexam, you need to load the symbol files,which contain NT system debugging information. Make sure that the symbol files are for the versionof NT you're running, including any installed service packs. For the Intel version of NT, the symbolfiles are in the supportdebugi386symbols directory on the NT resource kits' CD-ROMs

Figure 2 shows the syntax for dumpexam. For example, if you want to analyze a dump file for acomputer with NT Workstation 4.0, the symbols are in the directory d:symbols. The dump file,server.dmp, is in the directory d:dump. The command line reads

dumpexam -y d:symbols d:dumpserver.dmp

The results of the exam will be in %SystemRoot%MEMORY.TXT.

Interactive debugging. The other method of debugging is interactive debugging.Device driver developers, rather than systems administrators, usually prefer interactive debuggingbecause the process requires extensive knowledge of NT internals.

Interactive debugging requires you to have another PC (a host machine) with NT installed and torun the kernel debugger on the host machine. It must be running the same version of NT as the targetmachine. The host machine must be connected to the problem computer via a modem or null cableconnection.

Is the System Safe?
Can you safely use the system after a kernel STOP error? The answer depends on whether you canisolate what caused the problem. I've seen cases where the error condition happens once, never torepeat again. In other cases, however, the error occurs after the user has installed or updated adriver. In this case, you need to remove the driver and start over. When you get the kernel STOPerror, reboot the system, and hit the space bar when you see the Last Known Good text. This actionstarts NT with the last known working configuration, without the offending driver. The option ofreverting to the last known working configuration reinforces the wisdom of installing one driver ata time and making sure that the driver works before you install another driver. If the driverdoesn't work correctly, you can revert to the previous working configuration. If you install two ormore drivers at the same time and one of them causes a problem, you will have trouble determiningwhich driver caused the problem.

If the problem is not related to a driver, look at new system hardware, such as a newcontroller card. To determine whether a controller card is the problem, remove the card and test thesystem again. If the problem goes away, check whether the card (or any new hardware you add to yoursystem) is in the Microsoft Hardware Compatibility List (HCL). TechNet and Microsoft's Web site(http://www.microsoft.com) have an up-to-date list.

The Bottom Line
I hope that, after reading this article, the blue screen won't intimidate you. Using thetechniques I've explained, you can find out, in general, why you got the screen and perhaps tellspecifically what caused the problem. If you aren't a device driver developer and don't want to dealwith interactive system debugging, noninteractive is the way to go. Your goal is to get your systemup and running as fast as possible. Isolating the problem is the first step. For additionalresources, see the sidebar, "Other Sources of Help."

Other Sources of Help
Microsoft TechNet CD-ROMs
contain the Microsoft Knowledge Base (the same library that Microsoft Product SupportSpecialists use), resource kits, and educational materials.

Microsoft Network (MSN) and CompuServe
have several Microsoft forums where you can post questions and obtain answers.

Microsoft Download Library (MSDL)
is an electronic bulletin board system (BBS) from which you can download drivers and othersoftware. The phone number for MSDL is 206-936-6735.

Microsoft Web Site (http://www.microsoft.com)
contains product information, drivers, service packs, and more.

As the article notes, if the problem driver is a Microsoft driver, you can send the dump file toMicrosoft for analysis. If the driver is from a third-party manufacturer, you can send the memoryfile to that manufacturer.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like