Choosing Default Sizes for Your Data and Log Files
Once you’ve chosen how you’re going to manage your transaction log, you can do capacity planning. Part of why you need to determine how you’re going to manage your log is because the recovery model is what defines how much activity will be stored in the log before it’s able to be cleared.
December 13, 2010
Q: When you create a new database in SQL Server 2008 and SQL Server 2005, the data and log files have the following characteristics by default:
Data: initial size 2MB; Autogrowth: by 1MB, unrestricted growth
Log: initial size 1MB; Autogrowth: by 10 percent, unrestricted growth
These default values can easily be changed, and in a typical development environment these settings probably don’t mean much. However, having worked as a consultant at various companies, I can’t tell you how many organizations I've seen with high-volume production databases left with these default settings for years, so that they’ve grown to many gigabytes in size.
Wouldn’t these settings create an extremely fragmented database over time, with many Virtual Log Files (VLFs) that would heavily impact database performance? Why does Microsoft include these settings in SQL Server, and why isn’t this issue obvious to most DBAs?
A: First and foremost, I agree. I’ve never liked the defaults, but when I sit down to think about it, I struggle to come up with better default numbers. Don’t get me wrong, I think these are horrible default numbers, and I wouldn’t let a production database be created with them. However, I can’t expect Microsoft to use 1GB as a default for a growth amount, even though that could be much better. See, this is one of those annoying cases in which “it really, really depends” on your specific environment.
Related: How to Determine the Default Cache Value in SQL Server 2012
So let me start by saying I just wish Microsoft wouldn’t set any defaults, and I wish the UI was set up to ask the database creator questions about the database so that these settings would be specific to the environment. However, there are a few factors that might complicate things. To be more specific, I’ll break this down by the type of file because the properties, settings, and behaviors of the data portion and the log portion of the database are very different.
The Data Portion of the Database
The data portion of the database, which can consist of one or more file groups, each with one or more files, holds everything from your system tables to your user tables. You can specifically place objects within file groups, and therefore on specific storage. All of this sounds good, but how should you figure out what the placement should be? My general recommendations are as follows:
Do a minimal amount of capacity planning to help define your initial size and a maximum size. This capacity planning doesn’t have to get you to the number of bytes you’re going to store, but it’s better to know something about your current size, expected growth, and the length of time that you expect this hardware to store your data without replacement.
Don’t let autogrowth happen in your database because it leads to fragmentation. However, I would leave the autogrowth feature on so that when the numbers you came up with during capacity planning turn out to be a bit off from what actually happens, you minimize downtime by letting autogrowth kick in. You should also make sure that you’re monitoring growth so that you don’t find your database at the maximum with no disk space available.
Set a maximum size that isn’t unlimited. I’d rather get an error and have to change only the maximum rather than find out that I have no disk space available.
Set a reasonable growth size. Prior to SQL Server 2005, data files had to zero initialize all the disk space they allocate. If a database is large, this process can take a considerable amount of time. In SQL Server 2005, instant initialization was introduced. This feature makes file creation or file growth for the data portion extremely fast (essentially instant). However, it’s only allowed when your SQL Server instance has access to the Windows Perform Volume Maintenance Tasks permission. If your SQL Server instances aren’t running under the context of a local administrator (and I hope they aren’t), then you must explicitly grant this permission. But with this great performance gain comes a very minor security risk. I’ve explained the risk in my blog post “Instant Initialization—What, Why and How?”
All in all, there’s really no great default here because all of this hinges on the initial size of the database, as well as whether you can leverage instant initialization. The following are reasonable numbers for data file properties:
Initial size: 50GB
Growth rate: 5GB (with instant initialization on; without it, you might want to set this between 1GB and 2GB)
Max size: 100GB
Note that if your database has Transparent Data Encryption (TDE) enabled, you can’t use instant initialization.
The Log Portion of the Database
The log portion of the database, which can consist of one or more files but doesn’t use file groups, holds all of your database’s changes until SQL Server no longer needs them (i.e., they aren’t required to be kept around for something such as a log backup, replication, or database mirroring). In essence, this really translates into the following two options:
You decide that you want control to manage your transaction log and that you want the maximum number of recovery options. You’ve chosen to do log backups and stick within either the FULL and/or BULK_LOGGED recovery model. This is preferred if you want to minimize downtime and data loss. If you’re trying to maintain 24 × 7 availability, you might stay in the FULL recovery model and never allow changes to BULK_LOGGED. If you have a maintenance window, then you might allow temporary switches to the BULK_LOGGED recovery model to improve the performance of bulk (i.e., batch) operations. However, switching between recovery models isn’t something you should do without fully understanding the ramifications and requirements. This is well beyond the scope of this article but gives you a few things to think about.
If you decide that you don’t want to do log management and that some data loss is acceptable, then you can use the SIMPLE recovery model. This model lets SQL Server clear the inactive portion of the transaction log after a checkpoint occurs, which helps to keep the transaction log smaller (overall) and reduce maintenance. However, this model has the highest amount of data loss because you’re only as good as your last full database backup (or differential backups).
Once you’ve chosen how you’re going to manage your transaction log, you can do capacity planning. Part of why you need to determine how you’re going to manage your log is because the recovery model is what defines how much activity will be stored in the log before it’s able to be cleared. Then, you can perform transaction log backups to back up and clear the inactive portion of the log. If you back up frequently, then theoretically you won’t need a large transaction log. However, remember that SQL Server can clear only the inactive portion of the log. If there are active transactions, they might not allow the log to clear immediately. This means that SQL Server will require a larger transaction log.
Once you fully understand how the log works, my general recommendations are as follows:
Preallocate the transaction log to a reasonable initial size such that autogrowths are reduced or eliminated. When creating the transaction log, size it in chunks of 8000MB—not 8GB, as there’s a bug with increments of 4GB or more. For more information about the bug, read Paul’s blog post, “Bug: log file growth broken for multiples of 4GB.” So, if you’ve decided that you want a 100GB log, I’d size it as 8000MB, then alter it to 16000MB, then 24000MB, and so forth, stopping at 104000MB.
Set the maximum size to a reasonable maximum such as 208000MB (for the log described in the first bullet) with a growth rate of 8000MB, but realize that the transaction log will always need to do zero initialization, and therefore a growth of 8000MB might take from a few seconds to a few minutes, but only when it’s actually growing. Once it grows, if you really do need that space, do not shrink the log.
The end result is that not everyone needs a transaction log measured in 10s of gigabytes or 100s of gigabytes. So a default of 8000MB would be too high, and for larger databases anything smaller would create a lot of fragmentation.
Sizing and optimizing the transaction log isn’t trivial, and you want to get this right; otherwise, you might end up with fragmentation in the log (e.g., too many or too few VLFs). And although this is no longer as big of a problem as it’s been in releases prior to SQL Server 2005, you should still size the log appropriately. Coming up with defaults that aren’t database specific is really hard. There just aren’t better generic numbers that Microsoft could use. I wish there were no defaults and that all databases had to have their settings explicitly set at creation. Additionally, I wish there were a wizard that explained many of these things (at least briefly) so that people would make better decisions earlier in the process. What these defaults have led to, as you stated in your question, is inefficient databases with settings that don’t scale.
In the end, I think we can agree that the defaults stink! And I hope that I’ve given you a few things to think about when sizing and setting up your database files appropriately.
About the Author
You May Also Like