Failed Processor in Multiprocessor System
A failed processor takes down a multiprocessor Win2K server.
June 23, 2003
One night at 7:00, I received a call at home informing me that a Windows 2000 server was down at my company's corporate headquarters. The server in question was a Dell PowerEdge 2300 Win2K domain controller (DC) that handled the PDC emulator and Relative Identifier (RID) and infrastructure master roles, ran Microsoft Internet Security and Acceleration (ISA) Server 2000, and housed some shared folders. The corporate office had no Web access while the server was down—thus, the Microsoft Exchange 2000 Server system was processing only internal email, the remote offices weren't receiving email, and the company Web site was inaccessible.
I asked our application developer, who was still in the office, to take a look at the system. He reported that the server was running but had no video output. I asked him to power off the server, wait 15 seconds, then fire the server back up. He tried this solution twice with no luck—still no video output, and the system wouldn't begin the power-on self test (POST). The motherboard seemed to have failed. Because we couldn't order such a part that night anyway, I decided to wait until the next day to tackle the problem myself.
When I arrived the next morning, I noticed that someone in the shipping department was logged on and running a Microsoft SQL Server application. I was encouraged, but I found the server in the same state as it was the night before. I unplugged the server and reseated every module, card, drive, and cable. The server still wouldn't boot.
I called Dell's system support, and a technician instructed me to remove almost everything from the system (e.g., memory, drives, cables). He wanted me to start the system with no peripherals attached, then shut down the server and reattach one peripheral at a time, followed by a power cycle, to isolate the failed part. When I tried to power up the system with no peripherals attached, I still didn't get any video output—however, the system did beep to signal that the memory was removed.
Only the two processors remained on the motherboard. I was still convinced that the motherboard was at fault. The technician asked me to remove the second processor; the server still wouldn't boot. Then, the technician asked me to remove the first processor and put the second processor in its place. Voilà! The system booted up. The problem had been a failed microprocessor.
But I wasn't out of the woods yet. I received an error message stating that a terminator card was necessary in the second processor slot. We originally purchased the system with only one processor. Luckily, I kept the terminator card when I later added the second processor. Within a few minutes, I located and reinserted the terminator card.
However, I was worried about trying to boot my multiprocessor version of Win2K on a system with only one processor. I searched the Microsoft Web site and found the article "How to Manually Add Support for a Second Processor" (http://support.microsoft.com/?kbid=156358). The article doesn't mention Win2K but explains that Windows NT can't switch seamlessly between one processor and multiple processors. However, my LAN support company assured me that my system would run fine and that I wouldn't need to change the software.
The only side effect I noticed from having only one processor was a slower boot and service startup. When we received the replacement processor the following day, I installed it in place of the terminator card. This experience taught me that you can sometimes overcome even a hardware failure. In addition, I learned the importance of having a terminator card available for your multiprocessor systems in case you ever need to revert to a uniprocessor configuration.
—David Fosbenner
[email protected]
About the Author
You May Also Like