Can Backup Checksums Be Used as a Substitute for DBCC CHECKDB?
I’m having trouble finding a time when I can run DBCC CHECKDB on my databases because of the resources it consumes. I’m using the CHECKSUM option on all of my backups—is this good enough for me to be able to stop running DBCC CHECKDB regularly?
August 14, 2010
Q: I’m having trouble finding a time when I can run DBCC CHECKDB on my databases because of the resources it consumes. I’m using the CHECKSUM option on all of my backups—is this good enough for me to be able to stop running DBCC CHECKDB regularly?
A: This is a question I’m asked quite frequently and the answer is, unfortunately, no. It’s a well-known fact that DBCC CHECKDB (or any of the derivative DBCC commands) uses a lot of CPU, memory, I/O, and tempdb resources—it’s just the nature of what DBCC CHECKDB has to do to consistency check a database as fast as possible.
As databases are getting larger, many people are finding that they don’t have a maintenance window in which they can accommodate the performance degradation that can occur while DBCC CHECKDB is running, so they’re looking for alternatives.
When you perform a backup using the CHECKSUM option, the backup system will check the page checksums that are present on the data file pages being backed up. If a page checksum error is found, the backup will terminate with an error (by default—this can be overridden using the CONTINUE_AFTER_ERROR option). This, of course, is sufficient to check whether an I/O subsystem problem has caused corruption in the database, but it can’t detect corruptions caused by other problems.
The reason that using backup checksums isn’t a substitute for running DBCC CHECKDB is that a data file page could have been corrupted in memory (by a faulty memory chip, for example) and then the corrupt page written out to disk with a new page checksum. Although this is a rare occurrence, corruptions like this do happen. Simply checking that the page checksum is still correct isn’t enough to detect this kind of corruption—a proper consistency check must still be performed.
The buffer pool itself will try to catch such corruptions before they make it to disk. Part of the lazy writer background process, which is responsible for keeping free space available for pages to be read from disk, will check pages in the buffer pool that have a page checksum and haven’t been changed in memory since the last time they were read from or written to disk. If such a page is found to have an incorrect page checksum, high severity error 832 will be written to the error log.
Backup checksums should always be used, but the best way to offload the consistency-checking workload from your production system is to take a full database backup, restore it on another SQL Server instance, and run the consistency checks there. This procedure also lets you check the validity of the backup you just performed. For more information, see my blog post “CHECKDB From Every Angle: Consistency Checking Options for a VLDB."
About the Author
You May Also Like