The Exchange Server Troubleshooter - 01 Nov 1998
The troubleshooter explains how Exchange Server routes messages and how to investigate routing problems. He also discusses maintaining the database and informing senders that a recipient has left the company.
October 31, 1998
I want to devote most of my first column as the Exchange Server troubleshooter to answering some readers' questions about message routing. Troubleshooting nondelivery receipts (NDRs) and mail failures is easier if you understand how routing works.
How does Exchange Server route messages?
Exchange Server routes messages to a destination in three cases: when the sender and recipient are on the same server, when they're on different servers within the same site, and when they're on servers in different sites. The last case includes messages that you send via the Internet Mail Service (IMS) and X.400 connectors, because Exchange treats those connectors as if they were servers in a foreign site.
How does Exchange Server route messages when the sender and the recipient are on the same server?
Routing messages when the sender and the recipient are on the same server is the simplest case. The client sends the message to the server, where the Information Store service processes it. Exchange delivers non-Messaging API (MAPI) clients' messages directly to the Information Store service via protocol stacks that handle Post Office Protocol (POP) 3, Internet Message Access Protocol (IMAP) 4, and HTTP clients. The Information Store then puts one copy of the message into the private Information Store, and each recipient mailbox gets a pointer to the message. This single-instance storage means that Exchange has to store only one copy of the message in the private Information Store on each server, which results in significant space savings over storing a copy of the message in each mailbox. Finally, if any recipients are connected with MAPI clients, the Information Store service notifies them that new mail is available.
How does Exchange Server route messages to recipients on different servers in a site?
Intrasite delivery involves several steps. Let's say Alice, whose mailbox is on a server named HQ, sends mail to three users: one on HQ, one on the ENGINEERING server, and one on the FINANCE server. When the HQ Information Store service gets the message from Alice's Outlook client, it resolves each recipient address in the directory. Finding that one user is on HQ, the service delivers the message according to the steps that the previous question outlines. For each recipient whose mailbox is not on HQ, the following happens:
The HQ Information Store service hands the message over to its local Message Transfer Agent (MTA).
The MTA resolves the recipient's address in the directory; that is, it finds the recipient's home server.
The MTA (still on HQ) opens a remote procedure call (RPC) connection to the MTA on the recipient's home server (e.g., FINANCE) and transfers the message. When the MTA completes the transfer, HQ is finished with this recipient and starts over at Step 1 for the next recipient.
The MTA on FINANCE resolves the recipient's address again, turning it into a distinguished name (DN).
If the recipient has a mailbox on FINANCE, the delivery proceeds routinely, just as though the message originated on FINANCE.
If the recipient's home server is somewhere else in the site, the FINANCE MTA jumps back to Step 3 and moves the message to the correct home server for that recipient. This action can happen if you've moved the mailbox to another server, or if you've moved the server in from another site.
What is the GWART?
The GWART is the Gateway Address Routing Table. To understand the GWART, though, you need a couple of fundamentals. (For more information about the GWART and how it works, see Tony Redmond, "Understanding the Exchange Message Transfer Agent," Windows NT Magazine, January 1998.) Let's start with address spaces. Each connector can have one or more address spaces attached to it; these spaces define a range or set of addresses that the connector can talk to. For example, the IMS usually has an address space of *, which means that it can handle any Simple Mail Transfer Protocol (SMTP) address. You can just as easily define the address spaces *.com and *.org to control mail delivery to individual domains.
Each address space has a routing cost associated with it. For more information about routing costs, see Mark Ott, "Intrasite and Intersite Directory Replication," May 1998. This cost is a number between 0 and 100; higher values are more expensive. When Exchange gets ready to route a message, it always chooses the lowest-cost path that will get the message to its destination. Because each connector in a site can have cost parameters, you can set routing costs on each address space to control message flow. Assign lower costs to the connectors you want Exchange to use most frequently, and higher costs to connectors you want Exchange to use less often. Because Exchange always picks the least-cost path, messages flow where you want them to, unless the cheapest connector is unavailable. In that case, messages automatically begin flowing over the next-cheapest path.
The address space and cost information lives in the GWART. The GWART resides in the Organization's directory, so Exchange automatically replicates it to all servers and sites in the Organization. (Two text copies of the GWART, GWART0.MTA and GWART1.MTA, are stored in the exchsrvrMTADATA directory.) The System Attendant service and the MTA update the GWART whenever you add a new address space or when Exchange Server recalculates the routing cost. You can schedule the task to occur automatically, or you can perform it manually. The Routing Calculation Schedule tab of the Site Addressing object lets you choose Recalculate Routing to force a GWART recalculation. When an MTA wants to find the best route for a message, it consults the GWART.
What about recipients who don't have Exchange Server addresses?
If the recipient address's DN is an SMTP, Lotus Notes, Microsoft Mail, or other non-Exchange address, the MTA uses the address space and routing information in the GWART to find a connector that can process the message.
How does the MTA decide which route to use?
Routing is a two-step process. In the first step, routing, the MTA lists every connector that has an address space that matches the recipient address. For example, if the MTA is routing a message with a cc:Mail address, it builds a list of all connectors that have cc:Mail address spaces. The result of the routing phase is a list of all the places that Exchange could send a message.
The second step, selection, uses the list of all possible routes and their associated costs to choose the least-cost route. The MTA adds the costs required for each message path, and the one with the lowest total cost winsthat's the path the MTA will use to route the message.
Let's say you're sending an X.400 message and you have three X.400 connectors in your organization. One connects directly to the backbone and has a cost of 25 with an address space of *. One connects to XYZ, a company you do business with. Its address space restricts its use to messages sent to the path of country (c) =US and organization (o) =XYZ Inc., and its cost is 5. The third is a connector to MCI; it has an address space of * and a cost of 50. If you send a message to someone at XYZ, the least-cost route to get it there is directly over the XYZ connector, so the message will follow that path. If you send a message to someone else, the least-cost route is through the backbone connector. If that connector is unavailable, the MTA will use the next available path—through MCI—instead.
Can you give me a more complex example?
You've already seen how messages flow within a site. The picture expands when you want to send mail outside your site, either through a connector or to another Exchange site. Let's say you're forwarding a great joke to your mother, your boss, your cubicle mate, two coworkers, and your email administrator. To make things interesting, let's say that Mom is a custom recipient with an SMTP address; you, your cube mate, and your boss are on the same Exchange Server server; your email admin is at your company's main site in Atlanta; and your two coworkers have Lotus cc:Mail. You compose your message and send it. What happens next?
The message arrives at your server. The Information Store service delivers a copy to your boss and your cube mate. (More precisely, because the store uses single-instance storage, only one copy exists in the store; both recipients receive pointers to the message in their private mailbox.) You hear your boss laughing from down the hall.
The Information Store service resolves the other recipient addresses. Because no other local recipients are on the list, the Information Store hands the message off to the MTA.
The MTA starts with your mom's address, resolves it, and finds that the resulting DN is an SMTP address. The MTA searches the GWART to find the least-expensive route for SMTP addresses and finds that the route is to the local IMS. Therefore, the MTA transfers the message to the outbound IMS queue.
Next, the MTA processes your email administrator's address. Because he's in a different site, the MTA uses the GWART to list all possible routes between your site and his. When the MTA has found the least-cost route, it delivers the message over whatever connector the route specifiesover a messaging bridgehead server or directly to the server hosting his mailbox. When the message arrives at the remote site, the receiving MTA resolves the address and hands it off to another MTA, if necessary. This process continues until the message reaches the administrator's home server, at which point the Information Store service takes over and delivers the message.
Next, the MTA processes your coworkers' addresses. (The MTA resolves each address separately, but I'll lump them together.) When the MTA resolves the address, it finds that the address is in cc:Mail form, so it consults the GWART and finds only one possible route. The MTA passes the message to the specified connector, which delivers the message to the recipients.
How can I troubleshoot routing problems?
Two indicators signal routing problems. The most obvious indicator is an NDR. The other indicator is mail not reaching its destination. Here are some basic steps you can use when you suspect a routing problem.
If you receive an NDR, read it carefully. Fix any address or size restriction problems.
Verify that the message left the sender's outbox. If not, the most likely causes are a sender-to-server network problem or a problem with the sender's home server MTA.
Use a server monitor (software that automates finding many kinds of problems) or Server Manager to verify that the MTA, the Information Store service, and the Directory Service are running on the sender's home server.
Verify that the sender and recipient servers can communicate over the network. You can use the rpcping tool or the standard TCP/IP ping and tracert utilities to look for problems.
Repeat the same tests to make sure the Directory Service, the Information Store, and the MTA are all running on the recipient's server (if the recipient is an Exchange addressee).
If you're sending Internet mail, make sure that the IMS is up and correctly processing outbound mail.
If you have recently changed any address space or cost information, change it back, and then select Recalculate Routing on the Routing tab of the Site Addressing object to force Exchange Server to rebuild the GWART.
Check the MTA queue in the Exchange Administrator program. If the MTA queue length is growing and you've already verified that the origin and destination MTAs are still up, turn on message tracking to find out where the message is getting stuck. For more information about message tracking, see Mark Ott, "Pursuing Pesky Mail Messages," April 1998, and Tony Redmond, "How Message Tracking Works," June 1998.
What does ESEUTIL do?
ESEUTIL, the Exchange Server 5.5 database-maintenance utility, is the descendant of EDBUTIL, the maintenance tool that shipped with Exchange Server 4.0 and 5.0. ESEUTIL has six primary functions:
Checks the integrity of a database. In this nondestructive operation, ESEUTIL checks only the database pointers to make sure they correctly link items.
Defragments a database. This defragmentation moves all data in the database so that the data is contiguous on the disk.
Attempts to repair a damaged or corrupted database. The repair operation tries to nondestructively fix problems in the database without causing any data loss.
Attempts to recover a database that's too badly damaged for Exchange Server to repair. This recovery is the database equivalent of amputating a gangrenous limb; losing half of your private Information Store is better than losing all of it.
Upgrades the database from one version of Exchange Server to another. However, you probably won't ever need to upgrade manually.
Dumps the contents of the public or private Information Store databases. You might want to dump the databases when you're debugging problems with the help of Microsoft Product Support.
Why is defragmentation necessary?
As new data arrives, Exchange Server has to expand the Information Store database files to accommodate it. This expansion causes the database files to grow on disk, in the same way that a balloon inflates when you blow air into it. However, let's say that in a typical day your store receives 100MB of mail and users delete 80MB of it. You might guess that the total change in Information Store size is only 20MB, but the actual change in the size of the database on disk could range from zero to 100MB. When users delete items, Exchange Server doesn't return the newly freed space to the operating system (OS); instead, it keeps the space and recycles it as users add new data to the Information Store.
Over time, the amount of white, or unused, space in the database fluctuates as users add and delete items. For best performance, Information Store items need to be in contiguous chunks in the database file, but because Exchange Server allocates space on a first-come, first-served basis, a contiguous arrangement isn't always possible. The result is a fragmented database, which slows down database performance. In some cases, your Information Store files might also have white space that the OS or another application could use.
When is running the Exchange Server database-maintenance utilities appropriate?
This question provokes a lot of discussion, but Microsoft has a particular position: Don't run ESEUTIL or EDBUTIL unless it's absolutely necessary! This edict might seem draconian, but Microsoft has a reason for it. Any time you take Exchange Server offline and change the database, you're taking a chance that it won't come back up. Why put your messaging system at needless risk?
What maintenance is absolutely necessary?
The obvious do-or-die time to use ESEUTIL or EDBUTIL is when you're trying to restore a failed server. Administrators also typically use these tools after they've discovered a Directory or Information Store corruption error. (Ideally, they discovered the corruption early and not after they backed up the corrupt database onto the entire backup rotation.) Although you can fire up the tool as soon as you notice an error, you're usually better off calling Microsoft Product Support before you do anything. Product Support specialists have more experience isolating database problems than most administrators, and their advice can save you a lot of trouble.
For Exchange Server 4.0 and 5.0, you might benefit from using EDBUTIL to defragment your database from time to time because these versions don't have good online defrag capabilities. Exchange Server 5.5 is another story: Its online defragmentation is much more efficient, so you have little reason to take down an Exchange Server 5.5 machine just to do an offline defrag.
Another reason you might want to run ESEUTIL or EDBUTIL is to defragment your disk if you've removed large amounts of data from your store. For example, you might have moved many mailboxes to another server or removed a set of Usenet public folder replicas. In this case, running an offline defrag can recover a significant amount of database white space. However, as new data arrives, the store's size will grow again. So unless you really need the space, recovering the space won't help you for long.
If you decide to take the plunge, make sure you have enough disk space. Defragmenting a store requires 110 percent of the store file's size in free disk space; for example, if your store is 2GB you'll need 2.2GB free. Check Microsoft Support Online articles Q183888 (http://support.microsoft.com/support/kb/articles/q183/8/88.asp) and Q182903 (http://support.microsoft.com/support/kb/articles/q182/9/03.asp) for more details about online and offline defragmentation.
We have to occasionally remove mailboxes for people who have left the company. How can we send back a message to the sender explaining that the mailbox owner has left the company instead of sending an NDR?
Most humans will figure out the situation when they get an NDR that says that an address isn't valid. However, spelling it out might be more helpful, and it's fairly simple to do. Start by creating a new account, and give it an inbox rule that answers all mail with a short note; the note can say whatever you want. When someone leaves the company, reassign the mailbox to the new account by using the "Primary Windows NT Account..." button on the General tab of the properties dialog in Exchange Administrator. When people send mail to the former employee, the rule will return them a short "Alice doesn't live here any more" message. After a month or two, you can safely delete the mailbox, after everyone's gotten the word.
About the Author
You May Also Like