More About Exchange Content Indexing and PDFs

The PDF indexing problem with Exchange 2010 might be even more complicated than previously reported, but a third-party IFilter is available that could provide a solution.

Paul Robichaux

July 22, 2010

3 Min Read
ITPro Today logo in a gray background | ITPro Today

In my last UPDATE column, I wrote about PDFs and Microsoft Exchange Server 2010's full-text indexing feature. This week, I'm following up on that topic to tell you a bit more about the troubleshooting process and what I've learned.

First, when I wrote, "Moving a mailbox from one database to another is the only way to force these formerly lost attachments to be re-indexed," I wasn't quite clear enough about what actually happens. That explanation is what the Exchange documentation says, but in reality the full-text indexing engine is a bit more subtle. If a message fails indexing because of a filter error, the message will actually be queued for re-indexing—but on the second attempt, the attachment will be skipped. This method allows the store to index message content even when the attachments can't be read.

In my troubleshooting attempts, I used the invaluable Process Monitor to watch the full-text indexing processes. There are actually two services that cooperate to handle content indexing. The Exchange Search Indexer service is the main search component. It monitors for changes and gathers batches of messages, which it then passes to the actual indexer, known as MS-Search. This piece is responsible for actually loading the correct set of filters to index a given message, then using the filters to do the indexing. It accomplishes this by spawning one or more indexing daemons, which appear in procmon as msfted.exe. Those instances are the processes to monitor if you want to see what filters are being loaded—or are not being loaded.

It turns out that MS-Search is multithreaded and can spawn multiple msfted instances at once. The Adobe PDF filter, sadly, is single-threaded, so in practice that means you get a new msfted instance for each PDF that your server tries to index. That makes troubleshooting a bit trickier because these processes are created, and disappear, very quickly, so it’s hard to catch them.

After the original article came out, I learned that Foxit Software, which makes the excellent Foxit Reader PDF toolset, also makes an IFilter for reading PDF files. Although I haven't tested it, among its claimed advantages are up to 300 percent better performance than Adobe's own IFilter, and actual technical support—something sadly lacking from Adobe's free offering. Microsoft's Jie Li performed some off-the-cuff tests that show that Foxit's filter is actually up to 500 percent faster than Adobe's in some cases. This speed comes at a price, though—literally; Foxit sells its IFilter rather than giving it away like Adobe does. I'm perfectly OK with paying for it—if it works—and I plan to test it to see if it's something I should be recommending.

Finally, I want to make a clarification on a past UPDATE about Exchange ActiveSync (EAS) and mobile device wipe. I wrote, "Another potential fix is to have the device wipe itself unless it receives a keep-alive message (over an authenticated connection) telling it not to do so." A few readers have contacted me to ask how to turn this setting on in various versions of Exchange. Sadly, this isn't implemented in any current version of EAS, but we can always hope to see it in a future release.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like