Head Off Public Folder Replication Storms
A handy feature in Exchange 2003 SP2 stops 'em dead
October 22, 2006
Public folder replication is tricky even at the best of times. Few Exchange administrators understand how and when replication occurs, and the Exchange administrative tools, especially Exchange System Manager (ESM), are weak in this area, perhaps because Microsoft knew that it would eventually move away from public folders. When public folder replication gets out of hand, a phenomenon called a replication storm can occur. Public folder replication storms happen when an uncontrolled or unexpected amount of replication activity is generated within the organization, so that network links are swamped with data being exchanged between public folder servers to update their folder replicas. You can take control of replication storms, however, thanks to the Public Folder Content Replication feature, a small but significant feature that Microsoft slipped into Exchange Server 2003 Service Pack 2 (SP2).
Difficult to Diagnose
A replication storm is typically caused by a change that an administrator makes, although bugs in pre-Exchange 2000 Server SP2 versions have been known to cause replication storms, too. Sometimes a storm is the result of planned activity, such as when you add a new public folder server to the organization and need to replicate a large amount of data to that server to populate the set of folder replicas. You can plan ahead by implementing such changes when network activity is minimal (such as after hours or over a weekend). That way, the extra replication traffic won't interfere with regular business, and the storm should finish by the time users get to work.
Unplanned storms cause problems, especially when a large amount of data is channeled across an unreliable, slow, or high-latency network link. Most unplanned storms are caused by administrative error, such as attempting to replicate many large folders at an inappropriate time. For example, at 9:00 A.M.—a time of peak demand—you decide to create a new replica of a large folder on a server that's at the end of a high-latency link. Your intention is good—you want to help users of that server gain local access to the folder data—but the large amount of public folder data being transmitted causes a bottleneck.
In many cases it's hard to know exactly when public folder replication problems occur because Exchange doesn't flag problems, at least not in an easily accessible form. One symptom of a possible public folder replication storm is an increase in the number of transaction logs generated on the server and in the size of the public folder store. Some companies have put together their own early-warning systems to detect problems, using data extracted from the Application event log and Windows Management Instrumentation (WMI) counters.
The feature that Microsoft added to Exchange 2003 SP2 can help administrators manage public folder replication storms when they occur. The Public-Folder Content Replication feature lets you stop public folder replication across your entire organization, fix the problem by reversing the changes that you made (in this case, removing the folder replica from the remote server), then resume replication.
You might ask why you need to step in and take action to stop a replication storm. Indeed, in some cases, the storm will peter out by itself. If your servers are connected with high-speed, high-capacity links, you might never even realize that a replication storm was occurring. However, suppressing the excessive network activity caused by a storm becomes important if your servers are connected over low-bandwidth, high-latency links, because the extra replication traffic obstructs normal email traffic and can even interfere with your ability to manage servers across a saturated link.
Stopping Replication Storms
You stop public folder replication in Exchange 2003 SP2 through an organization-wide setting—the organization object in ESM. This is logical because you want to stop replication on all servers. To access the options to stop and restart public folder replication, you select the organization object in ESM and right-click it to see the option to stop public folder-content replication, as Figure 1 shows.
If you click the Stop Public Folder Content Replication option, ESM displays the warning that Figure 2 shows to ensure that you understand the effect of the action that you're about to launch. It's crucial that you be aware of the organization-wide impact of your action, as it will force all public folder servers to cease replication. Exchange will eventually backfill the necessary data after you adjust the parameters for replication, such as scheduling it for a time of low demand. But clearly you don't want every administrator in the organization clicking this option just to see what it does, so an administrator needs Exchange Organization Administrator permission to execute the option. If an unprivileged administrator attempts to execute the option, ESM issues an "LDAP Provider issue" message, which is ESM-speak to say that the user doesn't have the correct permission.
Assuming that you have the right permission, ESM sets an organization property to indicate that public folder content replication is disabled. ESM then sends a "cease-and-desist" notice to every server that hosts a public folder database in the organization. The cease-and-desist message stops servers from responding to backfill requests from other servers. A backfill request asks a server to provide data to update a public folder replica on a remote server. Servers will still send out backfill requests, but the receiving server won't answer the requests. When a server stops responding to backfill requests, this nullifies any replication requests that the server would typically handle. Of course, it's only as the servers receive and act on this cease-and-desist message that replication gradually decreases and eventually stops.
Note that changes to the public folder hierarchy are still replicated around the organization. This is because hierarchy updates go to every server that hosts a public folder database. However, hierarchy updates are small (just pointers to folders) and don't usually cause any replication problems.
Servers that don't run Exchange 2003 SP2 (or later) won't understand the cease-and-desist message or the organization setting that prevents content replication, so these servers will continue to respond to backfill requests. These non-SP2 servers also will continue to broadcast details of changes that occur to the public folder replicas that they host. Your ability to suppress a public folder replication storm effectively is therefore linked to the percentage of Exchange 2003 SP2 servers in your organization—the higher the percentage, the more replication you can suppress.
You can verify that Exchange 2003 SP2 servers that host public folders aren't responding to replication requests by checking for event 3118 in the Application event log on these servers. The Information Store (IS) logs event 3118 whenever a backfill request has arrived that the IS can't respond to—in this case, because of the stop-replication command you issued.
Restarting Replication
A replication storm is bad, but you do want public folder replication to proceed when network conditions are stable. Without replication, functions such as free/busy information won't work properly because servers can't replicate data between each other. You can start replication by clicking the organization object in ESM and selecting the Resume Public Folders Content Replication option. ESM displays this option to resume replication only when a stop-replication operation is in effect.
ESM then displays a dialog box asking you to confirm that you want to restart replication. If you confirm, ESM sends a message to the public folder servers to tell them to restart replication. Servers resume replication according to their usual replication schedule, so if you've configured servers to replicate data once a day at midnight, they won't resume replication until that time. Before you restart replication, you should ensure that you've addressed any problems that caused the original replication storm.
When replication begins again, the IS logs event 3119 in the Application event log to indicate that replication has been re-enabled. Replication should quickly settle into its typical pattern in small organizations, but distributed organizations that have more than a few public folder servers might take several hours to resume usual service. In very large organizations or those that depend on high-latency or low-bandwidth links to reach some servers, it could take several days for replication to settle down.
Looking Ahead
Exchange Server 2007 offers no prospect for radically better public folder management because Microsoft is de-emphasizing public folders. In fact, the Exchange 2007 management console doesn't let you manage public folders at all—you have to use Exchange Management Shell or keep an Exchange 2003 server around and use it to manage public folders, which isn't a satisfactory solution.
Because public folders don't figure highly in the future for Exchange, you might want to consider whether it makes sense to create any new folders or replicas at all because you'll only need to migrate this data to another repository in the future. Now is a good time to think about your future strategy for storing the type of data that you store in public folders today.
Administer with Care—and SP2
If you take care with your public folder administration, it's unlikely that you'll need to suppress a replication storm. However, accidents happen to even the best of us, and it's nice to know that you can—finally—do something about replication storms—as long as you've deployed Exchange 2003 SP2 throughout your organization.
About the Author
You May Also Like