Exchange 2007 Deployment Postmortem

Avoid these pitfalls in your migration

Michael Dragone

January 5, 2009

18 Min Read
ITPro Today logo in a gray background | ITPro Today


Today, email is a business necessity, and in many instances has exceeded the use of voice for routine communication between coworkers, clients, and business partners. If you were to ask 10 IT Pros to name the most crucial services they support, reliable email access would be at the top of their lists.

 

Unplanned email downtime, no matter how short, is unacceptable, even when moving from one email software platform to another. I recently transitioned my organization from Exchange Server 2003 to Exchange Server 2007 and will share some of my experiences in this article. As the procedures for transitioning to Exchange 2007 have been thoroughly covered in Windows IT Pro and elsewhere, I won’t rehash step-by-step what my transition plan was. Instead, I'll cover what pitfalls I ran into that I found to be off the beaten path in the hope that you'll have as smooth of an experience transitioning to Exchange 2007 as I did.

 

If No News Is Good News…

 

Unlike our transition from Windows XP to Windows Vista and Microsoft Office 2007 (see my previous “Vista Deployment Postmortem” article, InstantDoc ID 99110), our transition to Exchange 2007 had as little direct user impact as I could have hoped for. If we had not informed our users that access to their mailboxes would be unavailable one Saturday night for up to 30 minutes, their first knowledge that the email system was even touched would have been when they used Outlook Web Access (OWA) for the first time and noticed the new interface. This was my goal, and I’m happy to say we achieved it.

 

After the transition was complete, we provided our users with documentation on how to use many of the new Exchange 2007 features, including Out Of Office reply segregation, access to file shares and SharePoint libraries through OWA, and (the most welcome feature) a new maximum mailbox size limit, growing from 150MB to 2GB.

 

Although users didn't see any major changes, those of us in the IT department did. Our Exchange 2003 environment consisted of a single server supporting 250 users using Exchange Hosted Services Filtering for antispam and “last night’s tape” for backup. Our Exchange 2007 environment consists of a Cluster Continuous Replication (CCR) clustered mailbox server (CMS), a separate server running the Hub Transport and Client Access Server (CAS) roles with an additional one to be installed soon and Data Protection Manager (DPM) 2007 for continuous data protection backup while continuing to use Exchange Hosted Services Filtering for antispam. Although Exchange 2003 was deployed on Windows 2003, all of our Exchange 2007 deployment was on Server 2008 with the exception that DPM 2007 remained on Windows 2003. This decision was made to enable the most future-proof (if there is such a thing) deployment of Exchange 2007, while simultaneously allowing us to use Server 2008 features such as improved cluster configuration and support. This also meant that our Exchange 2007 deployment would start with Exchange 2007 SP1 because the RTM version is not supported on Server 2008.

 

Third-party Playground

Rounding out our messaging infrastructure are a third-party fax server, a third-party voicemail solution, and a BlackBerry Enterprise Server (BES). These applications proved to be the largest hurdle to jump in our Exchange 2003 to 2007 transition for several reasons. First, it’s difficult to test these applications in a lab. Although I attempted to test as much as possible, full end-to-end testing simply wasn’t possible in many cases. For example, how can you fully test a fax server without duplicating the hardware, including the PBX it’s attached to? I would have loved to have a duplicate setup, but it's far too cost prohibitive to do so.

 

Second, some of these applications don’t support the coexistence of Exchange 2003 and Exchange 2007 or don’t support it well enough to make you want to stay in a coexistence period for long. This was particularly true of the BES; I spent an inordinate amount of time speaking with RIM support about the correct coexistence procedure but only decided on a course of action after hashing it out with IT pros and Microsoft MVPs on the microsoft.public.exchange newsgroups.

 

You might be wondering why we chose to use both a third-party fax server and a third-party voicemail solution, rather than the Unified Messaging features built into Exchange 2007. For faxing, our users require the ability to fax outbound from Outlook, which Exchange 2007 doesn't support. For voicemail, we already had a lot invested in the current solution and were unwilling to transition to the 2007 Unified Messaging concurrently. We'll likely use Exchange for voicemail when the time comes.

 

Nagging Problems after CCR Installation

 

As per Microsoft’s recommendations, I first installed the Exchange 2007 CAS role, followed by the Hub Transport role. I elected not to use the Edge Transport role at all as it would bring additional complexity to our installation with little benefit because our antispam is hosted in the cloud. This turned out to cause a minor issue, as you’ll see later.

 

The next role I installed was the CCR clustered mailbox server (CMS), and although the installation went smoothly, some nagging problems appeared after the CCR CMS installation, as outlined below.

 

      CCR and the Offline Address Book. After the CMS was up and running, I moved the Offline Address Book (OAB)-generation process from the Exchange 2003 server to the CMS. Shortly thereafter, I began to see Event ID 1021 crop up in the Application log on the CAS. This event complained that the OAB file share on the CMS couldn’t be located, which made sense to me as it was attempting to access the share via the CMS network name, rather than the name of the physical machine that was hosting the OAB share. What didn’t make sense was that the event logged immediately thereafter was that the copy of the OAB from the CMS to the CAS was successful! This clearly wasn’t the case as the OAB web distribution share on the CAS was empty.

 

After digging around, I found a blog post by Dave Goldman (an Escalation Engineer on the Exchange Team at Microsoft) that explains the situation. Long story short, it’s a bug related to the Server 2008 clustering code. You can manually copy the OAB files from the CMS to the CAS. You can read the details at the MSDN Blog

 

This problem was fixed with the release of Update Rollup 5 for Exchange 2007 SP1 in early December, 2008. The relevant Microsoft article is “Exchange 2007 CAS cannot copy the OAB from the OAB share on Windows Server 2008-based Exchange 2007 CCR clusters.” 

 

      Passive node permissions incorrectly assigned. Firing up the Exchange Management Shell after installing the passive node of the CCR CMS, I noticed a strange error when running the Get-ExchangeAdministrator cmdlet. The cmdlet was complaining that the computer account of the passive node was not a member of the Exchange View-Only Administrators security group. A similar complaint wasn't raised after the active node was installed.

 

Some research online turned up the Microsoft article “The passive node computer account is unexpectedly assigned Full Control permissions after you install the Passive Clustered Mailbox role in an Exchange Server 2007 cluster environment.”  The information resolved the problem. Note also that no functionality was reduced and no service was impacted by this problem; it was simply annoying.

 

       Best Practices Analyzer (BPA) logs cluster errors. Typically, most Exchange Administrators will run the Exchange BPA on a regular basis to ensure that their Exchange deployment is following Microsoft’s recommended guidelines. I did so shortly after installing the CMS and received two warnings related to the cluster service. One indicated that the “dedicated heartbeat connection” wasn't set to priority 1. The other stated that the “MaxQuorumLogSize” was set too small.

 

I initially thought that these errors might be safe to ignore as I knew that there were several changes to clustering in Server 2008. I did some searching and confirmed my suspicions: These errors aren’t relevant in a CCR cluster running on Server 2008, and you can safely ignore them.

 

As LeVar Burton would say, you don’t have to take my word for it. At the time of this writing, Andy Grogan, another Exchange Administrator wrote this post on his blog: “Red Herrings with the Exchange 2007 BPA and Windows 2008 in a CCR Cluster…”

 

       Online maintenance checksumming. Exchange 2007 performs checksumming on its database pages when you perform an online backup using software that utilizes the streaming online backup API. As mentioned previously, our Exchange 2007 deployment uses DPM 2007 for continuous data protection. DPM uses the Volume Shadow Copy Service (VSS), not the streaming online backup API. Also, streaming backups can be performed only against the active node of a CCR cluster, which would negate any performance benefit gained by taking backups from the passive node. An alternative would be to run Eseutil regularly (which in fact is what DPM can do on its own copy of the data), but performing a check with Eseutil requires the suspension of transaction log replication. All copies of the database should be checksummed to ensure data integrity. Is there any hope?

 

There is. In Exchange 2007 SP1, Microsoft added the ability to perform online maintenance database scanning to take place during the database maintenance period in which database defragmentation also takes place. Setting this up requires configuring a registry key. The sordid details can be found in the TechNet article “Online Maintenance Database Scanning in Exchange 2007 SP1.”

 

       Two DPM servers, no joy. In addition to the DPM server we have in place protecting our CMS, a second DPM server is located offsite that protects the primary DPM server and the data that the primary DPM server protects. There's only one small problem with this: I haven’t yet been able to enable secondary protection for the CMS. Attempting to do so generates this error message:  “Setup cannot query the system configuration. Verify that the Windows Management Instrumentation (WMI) service is running. If the WMI service is not running, in Control Panel, click Administrative Tools, and then click Services. Right-click Windows Management Instrumentation, and then click Start. ID: 820”

 

WMI is of course running on both cluster nodes. I opened a support case with Microsoft regarding this issue and spent a good amount of time working with a great support engineer, who after many false starts and discussions with the DPM team was able to both confirm that this is a problem with DPM 2007 and that a fix was scheduled to be included in DPM 2007 SP1. As of this writing, DPM 2007 SP1 has been available for only a few days, and I have yet to test it to see whether this fix is indeed included as it is not mentioned in the Release Notes.

 

IPv6 and Outlook Anywhere (RPC over HTTPS)

A nagging issue that began to appear shortly after the introduction of Server 2008 was the inability of Outlook clients to connect to Exchange via Outlook Anywhere. The problem is due to the DSProxy component not listening on the IPv6 TCP/IP stack, only on the IPv4 TCP/IP stack. Server 2008 preferring IPv6 over IPv4 then causes the problem.

 

The solution at the time of this writing was to disable IPv6 or edit the HOSTS file, depending on your Exchange 2007 topology.  The TechNet article “Outlook Anywhere Client Connectivity Issue Because of TCP/IPv6” describes the changes that need to be made. The article “How does Outlook Anywhere work (and not work)?” (specifically the section “DSProxy and IPv6”) at the Microsoft Exchange Team Blog noted that a fix was scheduled to be included in Exchange 2007 SP1 Update Rollup 4 and indeed it was. The relevant Microsoft article “You are prompted for your credentials three times and you receive an error message when you use the Outlook Anywhere feature to connect to an Exchange Server 2007 Service Pack 1-based server that is running Windows Server 2008” 38 confirms both the problematic behavior and the resolution.

 

X.400 Here and Back Again

Knowing that Exchange 2007 doesn’t require X.400 for message routing, I (for reasons that I can only chalk up to sheer stupidity) removed the X.400 proxy address from the Default Address Policy shortly after I finished moving all our user mailboxes to Exchange 2007. I then stopped the Information Store service on the Exchange 2003 server as a test to ensure that users would be able to connect without any issues while the 2003 server was down.

 

My test ran successfully, so I went to restart the Information Store service on the Exchange 2003 server. It refused to start and several errors were added to the Application event log, including Event ID 1121 with the description “Error 0x8004010f connecting to the Microsoft Active Directory.” 0x8004010f translates to MAPI_E_NOT_FOUND. It was well after midnight the Saturday night of the mailbox moves, and I was both stumped and tired, while also not believing my luck that the Information Store decided not to give me any problems until after I had moved all our mailboxes.

 

After a brief moment of anger consisting of surfing the Lotus Domino and Novell GroupWise product sites, I tackled the problem and found my answer in the Microsoft article “Exchange Information Store does not start and events 1121 and 5000 are logged,” which shows why you should sometimes just stop working and get some sleep.

 

Exchange 2007 doesn’t require X.400 proxy addresses, but Exchange 2003 still does. After reading this, my first reaction was, “Oh. Whoops.” Followed by a glance back at the Application event log where I noticed that I also received a few errors related to the MTA service. My reaction changed to, “Duh, the MTA uses X.400. Idiot.”

 

Don’t remove the X.400 proxy address from the Default Address Policy if you still have legacy Exchange servers. If you do, you can simply add the X.400 proxy address back. After doing this, I was able to start the Exchange 2003 Information Store service successfully. I then immediately went to sleep before doing any more damage.

 

Entourage and Exprox.dll

As I mentioned earlier, our transition went well, including the first day of our users using Exchange 2007 in production. One problem we experienced, however, that I’m thrilled to report is finally fixed is with our one user who uses a Mac. From time to time, he complained that he couldn’t send mail through Entourage and must restart the application. Around the same time, the Application event log on the CAS records a crash of w3wp.exe in exprox.dll. Shortly thereafter, the Application Pool that crashed restarts and this one Mac user with Entourage can connect again. This also caused a brief interruption for users who were using OWA.

 

I opened a support case with Microsoft and received an Interim Update that we applied to Exchange 2007 SP1 Update Rollup 4 and it resolved the issue. It wasn’t finally put to bed, however, until the release of Update Rollup 5, which includes the fix. You can find the relevant Microsoft article "The W3wp.exe process may crash on an Exchange 2007 CAS server after you use Entourage to send a message that is larger than 48 KB."

 

Edge Transport, Hub Transport, Exchange Hosted Services and IP Addresses

As I indicated earlier, the design decision to not deploy the Edge Transport role because we use Exchange Hosted Services to provide the functionality that the Edge Transport role would give us caused a minor issue. On the Hub Transport server, I have configured a handful of Receive Connectors to support various tasks within our organization. One handles inbound email from the Internet, restricted to Exchange Hosted Services, while another handles inbound mail from internal web applications and so on. As such, the Hub Transport server is configured with several IP addresses.

 

This causes a problem with outbound mail. Although I’ve never seen the server select a source IP address for outbound mail other than the primary, there's a chance that it will one day begin sending outbound mail through one of the other IP addresses configured on the server. This necessitated that I configure the same number of public IP addresses, use Network Address Translation (NAT) to get them through our perimeter router, and alert Exchange Hosted Services that mail may come from us with any of those source IP addresses.

 

With an Edge Transport server, the -SourceIPAddress parameter on the New-SendConnector cmdlet allows a selection of source IP address. This parameter, however, is not valid on the Hub Transport server role. I spoke with Microsoft about the problem and learned that this behavior is due to the design of the Hub Transport server role. You need to be aware of this issue when planning your deployment, especially if you host your antispam solution in the cloud.

 

On with your Migration

Exchange 2007 offers a host of benefits that many organizations will be looking to take advantage of. Whether you’re just planning your transition or are already underway, I hope this article has highlighted some of the issues you might encounter and how to resolve them.

 

 

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like