Exchange 2000 Storage Exposed, Part 1
Dig into Exchange 2000 to find out how its database engine stores and recovers information.
June 13, 2000
Microsoft Exchange 2000 Server contains technology advancements that will change the way you store, maintain, and recover mission-critical messaging and knowledge-management data. To get the most from Exchange 2000, you need to understand several new concepts and paradigms. As organizations plan their migration to Exchange 2000, systems administrators want to know how the program stores data and provides disaster recovery. This article discusses how Exchange 2000's data storage and database engine differ from similar functionality in previous versions. In Part 2, I'll discuss the program's backup and restore operations and the anticipated best practices for disaster recovery.
Exchange Server Database 101
Thorough knowledge of Exchange 2000's database engine is fundamental to storage and disaster-recovery planning. The database engine, called Joint Engine Technology (JET) in earlier versions of Exchange Server, evolved into the Extensible Storage Engine (ESE) in later versions. ESE is a solid relational database technology similar to that of Microsoft SQL Server or Oracle, although ESE's implementation is quite different. Exchange 2000's ESE, a transacted storage engine that works primarily with messaging and collaborative data, guarantees that all database operations meet the Atomicity, Consistency, Isolation, and Durability (ACID) properties. ACID properties for database engines ensure that you can roll back transactions in the event of unsuccessful completion or replay them in recovery. Microsoft uses ESE throughout Exchange 2000, in places such as the Key Management Server (KMS) and the Site Replication Service (SRS), as well as in Windows 2000's Active Directory (AD).
ESE stores data in a balanced tree (B-tree) structure, which is well suited for storing semistructured data—the kind of content that messaging and collaboration servers deal with. Database engines such as SQL Server and Oracle are better for storing structured data and indexes that are less dynamic. B-tree technology isn't new to Exchange 2000. The technology has existed for years and is one of several fundamental database structures. Exchange 2000 arranges database files into 4KB pages in a hierarchical tree structure that stores messages, properties, and attachment data. Exchange 2000 uses the B+tree variant, which minimizes the tree structure's width and depth to ensure the fastest and most efficient access to data. Because ESE's design favors messaging and collaboration data, SQL Server and Exchange 2000 probably won't use the same technology any time soon.
Database Files
Exchange 2000's database engine operates on several key files that are important to the way the program stores messaging and knowledge-management data. Figure 1 shows how these files relate to one another. One key file is the properties store (i.e., the .edb file). In Exchange Server 5.5, priv.edb and pub.edb hold the private and public Information Stores (ISs), respectively. In Exchange 2000, the .edb file operates similarly to the way it operates in Exchange Server 5.5, storing Messaging API (MAPI) client data in Rich Text Format (RTF), with properties and attachments. Exchange 2000's IS overlays the logical mapping of messages, folders, and tables on the B-tree structure of ESE.
Exchange 2000 adds a new database file called the streaming store (i.e., the .stm file) to handle streaming and native Internet content. Internet clients such as POP3, IMAP, and HTTP use the streaming store exclusively. In addition, when content arrives through SMTP, Exchange 2000 pipes the content directly into the streaming store, thus bypassing the resource-intensive IMAIL conversion process that previous versions of Exchange Server used to convert native Internet content to RTF.
MAPI clients, however, don't use the .stm file. If a MAPI client needs access to content in the .stm file, Exchange 2000 converts the content on demand. The .edb file still contains properties and headers for content stored in the .stm file. The .edb files utilize the B+tree structure, but .stm files store data pages in a clustered-run style similar to that of a file system such as NTFS and better suited to the sequential access requirements of streaming content.
The concept of a message database (MDB) also changes in Exchange 2000. An MDB in Exchange 2000 is a set of .edb and .stm files. Each MDB in a storage group (SG) consists of one .edb file and one .stm file. Both storage mechanisms must exist for the MDB to be consistent and complete. The .edb file stores all the message properties and headers, including properties and checksums for the .stm file pages. The combination of these storage mechanisms lets Exchange 2000 store all data formats in a manner best suited to their type. For MAPI clients that need RTF data, the .edb file is the most efficient storage mechanism. Even when MAPI clients such as Microsoft Outlook 2000 and Outlook 98 use HTML, the .edb file stores their data as rich text. For Internet clients that require MIME content, the .stm file provides the fastest and most efficient approach.
Transaction Logs and Checkpoint Files
Because Exchange 2000 provides flexible, high-performance transacted storage, several other key files are worth mentioning. The most important is the transaction log file (i.e., the .log file). ESE first writes operations to the transaction logs, then to in-memory buffers, and finally to the database. This procedure ensures that all operations maintain database integrity and are recoverable. Each database engine instance (i.e., SG) in Exchange 2000 maintains a set of transaction logs. All the databases within an SG share the same transaction log files. As in earlier versions of Exchange, ESE writes 5MB of transactions to each .log file, closes it, and creates a new file for subsequent transactions. ESE creates transaction log files in sequential order, and the most current log file is edb.log. Because an MDB consists of .edb and .stm files, Exchange 2000 records transactions involving both types of files in the same edb.log file. When the current .log file reaches 5MB, it closes and takes on a new name, edbxxxx.log (in which xxxx is the sequential serial number, or generation, of the .log file in hexadecimal format).
Checkpoint (.chk) files play an important role as Exchange 2000 writes transactions to the .edb and .stm files. The .chk file records the location in the .log files of the last complete transaction that ESE wrote to the database. Each SG maintains .chk files, a practice that aids database recovery. The .chk file lets the ESE know where to start when the database engine replays .log files during recovery, which makes recovery faster and more efficient.
Patch Files
The Exchange 2000 database engine uses patch (.pat) files only during backup and recovery operations; it creates one .pat file for each .edb file during an online backup. As Exchange 2000 performs a full backup, it commits database pages to tape. But user transactions continue to occur during backup because Exchange 2000 supports an online backup. If a transaction occurs to an already-committed page in the database and a page split occurs, the .pat file records the page split. If such a transaction doesn't cause a page split, the .log file—not the .pat file—records the event.
An example of a page-split operation is a situation in which updated data in a 4KB page exceeds the page size. In this case, the page must split into two 4KB pages. Page splits apply only to the .edb files. Patch files don't maintain entries for .stm files because .stm file structure lets Exchange 2000 easily allocate additional data. In addition, Exchange 2000 copies .stm file data differently than .edb data during a typical backup.
The .pat files for each MDB play a key role during recovery. Before Exchange 2000 replays the transaction logs, the program uses the .pat files to apply the page splits to the database file.
Integrity Ensured
Exchange Server's database engine has always ensured data integrity, and this essential feature lives on in Exchange 2000. When ESE writes a page, the program writes a page number and checksum cyclical redundancy check (CRC) to the first 4 bytes of the page. During backup and online maintenance, the database engine computes a checksum for each page so that the program can compare the page number and checksum with the original versions recorded in the page. If either is incorrect, Exchange 2000 lets the administrator know by logging an error in the event log. This -1018 error is a valuable early warning. Exchange 2000 also computes a checksum for each log record in transaction log files. This checksum ensures that every transaction log record is valid. Few database engine technologies provide this degree of integrity checking to page-level and transaction log record-level granularity.
The best protection against -1018 errors is to deploy solid hardware platforms and to practice good configuration management. Except for the transaction record checksum process, Exchange 2000's warning system doesn't change from earlier versions of Exchange Server, and Microsoft believes that customers want to know as soon as the program finds a database corruption problem. Multiple SGs and databases complicate the issue in Exchange 2000, but they also help with database recovery.
New Paradigms in Storage Management
An SG in Exchange 2000 is essentially another instance of the ESE database engine running within the context of the store.exe process. In Exchange 2000, multiple SGs can run on a server, and each SG can contain as many as six MDBs. Figure 2 shows the relationship between MDBs and SGs. In Exchange Server 5.5, only one SG is available on a server. Only one instance of the JET database engine runs on the IS in Exchange Server 5.5 and earlier. That JET instance supports two MDBs—pub.edb and priv.edb. In Exchange 2000, an administrator has more flexibility with IS design and can partition or segment the server population over a structure of SGs and MDBs, depending on organizational and disaster-recovery needs. For example, an administrator can spread users' mailboxes over several MDBs instead of limiting all users to one database, as in Exchange Server 5.5 and earlier. Suppose 2000 users on an Exchange 2000 server require 100GB of storage. A systems administrator can partition users across ten 10GB databases (200 users per MDB), rather than place them all in a 100GB database.
Multiple SGs in Exchange 2000 provide flexibility, manageability, and security. An ISP can host multiple companies on one Exchange 2000 server by separating them into different SGs. Commercial enterprises, departments, groups, or individuals (e.g., the CEO) can have their own SGs—a feature that provides better security and manageability. SGs are also important for clustering scenarios because the typical failover unit is a virtual server configured with one or more SGs.
In Release Candidate 1 (RC1), Exchange 2000 technically supports 15 SGs (plus one reserved for backup and restore operations), with six MDBs per SG. I expect management and disaster-recovery planning requirements to bring the practical limit to five to seven SGs per server. At Exchange 2000's initial release, Microsoft will support only four SGs per server and five MDBs per SG. With clustered configurations, failover complexities might reduce the practical limit even more. Also, Exchange 2000 supports concurrent backup and restore operations on SGs. In this case, an Exchange 2000 server with multiple SGs will let you recover one or more SGs (or an MDB within an SG) while the other SGs are online servicing users. Whatever the scenario, 16 ESE instances are available on the Exchange 2000 server. When you reach that limit, you can't perform any more parallel operations.
New Methods of Access
Besides the standard methods by which messaging clients can access their data (e.g., POP3, IMAP, MAPI), Exchange 2000 offers new methods for accessing and storing data. You can address every item in the program's database with a unique URL. This gives Web clients using HTTP better performance and functionality than they received in earlier versions of Exchange Server. Microsoft calls this feature the Web Store, and it holds new possibilities for knowledge management and Web portal applications. (For more information about the Web Store, see Tony Redmond, "Web-Enabling Exchange 2000," February 2000.)
Exchange 2000 also makes every item in the IS accessible through Win32 API calls and the Server Message Block (SMB) protocol. Therefore, programmers can write applications that directly store data to the program. Users can map drives to their inboxes or favorite public folders. Exchange 2000 uses this method, based on Installable File System (IFS) technology, to access .stm files. The IFS driver (ExIFS) provides direct access for several Exchange 2000 components. The ability to address Exchange 2000 with URLs or Win32 APIs sets the stage for some killer applications from Microsoft and third parties.
Best Practices Become Complicated
Your possibilities for allocating and accessing data in Exchange 2000 are almost limitless. In addition, the best practices you use to manage your servers need to change. In Exchange Server 5.5, storage-design best practices dictate that you need to separate sequential from random I/O when you configure disk subsystems and allocate Exchange Server databases. The same holds true for Exchange 2000. However, because Exchange 2000 supports multiple SGs or database engine instances, you need to apply these best practices to each ESE instance. For example, because each SG has a set of transaction logs that one or more databases share, the best practice of separating sequential from random I/O still applies, and each SG's transaction log set needs to be on a separate volume. If you combine all transaction log sets for every SG onto one RAID 1 array, you lose the advantages of sequential access because a set of sequential patterns combines to create a random pattern.
Continuing with the example, you also need to separate each database onto a dedicated array (i.e., RAID 1 or RAID 5) for best performance. Finally, because of the .stm files' highly sequential organization, some environments might require separate arrays for the .stm files. Putting this example together for a server with six SGs, each hosting four databases, would create a fairly complex storage design. Until Exchange 2000 gains acceptance and widespread deployment, systems administrators can only anticipate the design possibilities. But managing storage for Exchange 2000 will be more complicated than for Exchange Server 5.5.
The powerful new methods of storing data in Exchange 2000 are worth the additional complexities. Fundamentally, the underlying database engine hasn't changed much in Exchange 2000. You can expect the same performance, scalability, and reliability features as in Exchange Server 5.5. Exchange 2000 takes messaging data storage to the next level. The ability to store all the semistructured data within an enterprise is a driving force behind Exchange 2000 storage. Microsoft will build many future products on Exchange 2000, such as the forthcoming Tahoe product. The term Web Store will gain wider use than simply in the context of a messaging system. Also, Exchange 2000 has a new target market—ISPs and application service providers (ASPs). These markets have different storage and scalability requirements than most corporate messaging systems do. Exchange 2000 storage is well positioned for ISPs, ASPs, and corporate messaging systems.
About the Author
You May Also Like