More on Troubleshooting NT Boot Failures
With an understanding of the boot process, you can learn to avoid the blue screen of death.
April 30, 1997
Last month, I started explaining the Windows NT boot process, with an eye to diagnosing andrepairing failures. I explained that the system starts with a Power On Self Test, loads the firsthard-disk sector (the Master Boot Record--MBR), and (using information in the MBR) finds and loadsthe first sector of the C drive. That first sector, the boot record, loads the NTLDR program, whichmust be in the root of the C drive. NTLDR reads and interprets the boot.ini file and displays theoperating system picker (Windows NT Server Version 4.0, Windows NT Server Version 4.00 [VGA mode],and whatever other operating systems you have on your computer). After you choose NT, NTLDR loadsand executes NTDETECT, a program that sniffs out what hardware you have in your system, and passesthat information to NTLDR, which uses that information to create part of the Registry,HKEY_LOCAL_MACHINEHardware.
Loading the Operating System
The entries in boot.ini tell NTLDR where to find the WT files. Then NTLDR loads three files tostart NT. The first is ntoskrnl.exe, the basic NT kernel and most of the NT Executive. BeforeNTOSKRNL can do anything, it must be able to communicate with the basic computer system, so NTLDRloads hal.dll, the Hardware Abstraction Layer; think of HAL as the motherboard driver. Both filesare in the winntsystem32 directory.
The third file that NTLDR loads is sort of a config.sys for NT, a file that describes both thecomputer's hardware and the services running on that computer. This hive file called SYSTEM is partof the Registry and is in winntsystem32config. (Microsoft calls the files that the Registrylives in hives. Why? Good question. A Microsoft person once told me it was a leftover fromUNIX; UNIX folks I know just give me a blank look when I ask them about it.) SYSTEM is the entirekey HKEY_LOCAL_MACHINESYSTEM, which contains control sets. A control set is a list of drivers andsystem services NT needs to load, and configuration information for those drivers.
You'll see three control sets in your HKEY_LOCAL_MACHINESYSTEM. One is the CurrentControlSet; it contains a description of your current configuration, and any changes that you'vemade to your configuration today. Another control set is a copy of your current control set as ofthe last time you successfully started your system; this copy is the control set for Last Known GoodConfiguration. The third control set assumes that you're running the VGA video driver--that's whatyou get when you choose the [VGA mode] option on the operating system picker.
Loading a Configuration
At this point, the base operating system (kernel), the driver (HAL) that handles motherboard andmultiprocessor peculiarities, and the descriptions (the SYSTEM hive) of possible configurations, areloaded. But which configuration to load? By default, CurrentControlSet will load. But NT lets youchoose between CurrentControlSet and Last Known Good configuration. NT begins analyzing theCurrentControlSet about the same time that the Last Known Good configuration comes up on yourscreen, but you can interrupt the process if you press the spacebar. NT will then instead load theLastKnownGood control set.
The main branch inside a control set is the Services branch. In Screen 1, I've opened myHKEY_LOCAL_MACHINESystemCurrentControlSetServices key. On the left panel, you see namessuch as Aha154x, atapi, and Atdisk. Each name refers to an NT driver--Aha154x refers to some 16-bitAdaptec SCSI host adapters, atapi is the generic driver for EIDE drives, and Atdisk is the genericdriver for IDE drives. I compared this listing to config.sys under DOS. config.sys includes commandssuch as DEVICE=ASPI4DOS.SYS, which loads drivers, but a very important and fundamentaldifference is that config.sys names the drivers that you want to load. NT's Services key, incontrast, names every driver it's ever heard of, regardless of what driver you intend toload. The computer that I took that screen shot from does not have an IDE drive, an EIDE drive, oran Adaptec 154x SCSI host adapter. But how does NT know not to load the drivers for thosedevices? Look in the right pane. See the Start entry? The value is equal to 4, which in NT-esemeans, "You never need to load this driver." Oddly enough, the NT Services key isorganized so that it names every driver known to Microsoft, but the vast majority of them have Startvalues of 4.
Loading Drivers
NT loads drivers in three passes. The first set is a small group of low-level drivers essentialto getting the basic text-mode portion of the kernel going. NTLDR looks through the Services key tofind those drivers (e.g., disk drivers). Even the video drivers don't load yet. When NTLDR finds adriver with a 0 Start value, it loads that driver. You see a black screen with "OS LoaderVx.xx," and as NTLDR loads a driver, it places a period on the screen, giving you a smallmeasure of boot progress. NTLDR loads the drivers; it doesn't start them.
Once the drivers are loaded, NTLDR's job is done. NTLDR passes control over to NTOSKRNL, whichis already loaded into memory but has been inactive. NTOSKRNL wakes up when the screen turnsblue--the normal bootup blue, not the something-bad-happened-and-you-no-longer-have-a-serverblue--and the message "Micro-soft (R) Windows NT (TM) Version x.x (Build xxxx)"appears. As NTOSKRNL initializes, it reports the memory and number of processors in the system, andthe version number. After initializing itself, NTOSKRNL initializes the drivers that NTLDR loadedinto memory.
Load your operating system and be worry free |
This point is where the first bunch of blue screens--the bad type--can appear. Drivers thathave bugs often become apparent when they load. The error blue screen I see most oftenisINACCESSIBLE_BOOT_DEVICE, which is caused by either a buggy SCSI host adapter driver or a bootsector virus. The second common error is IRQL_NOT_LESS_OR_EQUAL, which is generated when a driverattempts to access memory that it's not allowed to access. Although this error is not welcome, itdoes generate some useful information. On an error blue screen, you see a message similar to ***stop: 0x0000000a (0x0000004c, 0x00000002, 0x00000001, 0x803214d2). The first and last numbers in theparentheses are good guides to what went wrong. The first number is the address in memory that thebuggy driver tried to access, the access attempt that caused the blue screen. The last value is theaddress of the instruction that tried to make the illegal access, the address of the culprit. In thelines following the STOP error message, you see a listing of driver names, file dates, and startaddresses. Look at the start addresses to pinpoint which driver (or program of any kind) containsthe address that attempted the illegal access.
Often this blue screen comes from a new driver--knowing that can come in handy for quick fixes.Suppose you find a new and improved version of your AIC78XX.SYS driver, the driver that controls theAdaptec 2940 SCSI host adapter in your computer, and load it, only to get a blue screen. What do youdo?
Rather than pull out some kind of system repair disk, all you need to know is that most NTdrivers are stored in winntsystem32drivers. You boot your system in DOS and grab the oldAIC78XX.SYS driver (the simplest method is to back up the old driver before playing with the newdriver, or you can uncompress the original driver off the NT Server CD-ROM). Copy the driver overthe newer, buggy driver. (This process assumes, of course, that the drive that you've put NT on is aFAT partition, which I recommend.)
Presuming you don't get an error from initializing the Start=0 drivers, NTOSKRNL scansthe Services key again, looking this time for the Start=1 drivers. In general, these are the driversthat the GUI will need, and the foundations of system services such as networking services. Forexample, the video, mouse, and sound card drivers load here. NTOSKRNL loads them, and as with NTLDR,puts a period on the screen for each driver loaded.
Once the Start=1 drivers are loaded, they initialize one by one. Again, you have thepossibility of a blue screen from any of them, but I've never seen a blue screen at this stage. TheStart=0 drivers seem to have the greatest system-killing power.
In the Clear
From there, the GUI loads, you log on, and you start a new day. From this point on, youhave to start loading applications to crash your system!
About the Author
You May Also Like