Practical Data Compression in SQL Server
How to take advantage of SQL Server's data compression feature
April 18, 2014
By taking advantage of SQL Server's data compression feature, you can improve SQL Server performance without changing any application code. Compressing data reduces database storage, which leads to fewer I/O reads and writes. However, SQL Server consumes more CPU resources when it compresses and decompresses data.
SQL Server currently supports two types of data compression: row compression and page compression. Row compression changes the data storage format. For example, it changes integers and decimals to the variable-length format instead of their native fixed-length format. It also changes fixed-length character strings to the variable-length format by eliminating blank spaces. Page compression implements row compression and two other compression strategies (prefix compression and dictionary compression). You can find more details about page compression in the Page Compression Implementation section of SQL Server Books Online (BOL).
Data compression is currently supported in the Enterprise, Developer, and Evaluation editions of SQL Server 2008 and later. I'll show you how to implement it in SQL Server 2012. I'll also show you how to use a script that can help you determine whether to use row compression or page compression based on your SQL Server environment's data activities and storage.
Implementing Data Compression
SQL Server Management Studio (SSMS) provides the Data Compression Wizard to help implement data compression. To launch the wizard, simply right-click a table name or index name, select Storage, and choose Manage Compression.
On the wizard's welcome page, click Next to go to the Select Compression Type page. On this page, you use the Compression type drop-down list (which is set to None by default) to select the type of data compression you want to use. You can get an estimate of the amount of space you'll save with that type of compression by clicking the Calculate button and comparing the values in the Current space and Requested compressed space columns. In the example shown in Figure 1, row compression will result in about 20 percent space savings.
Figure 1: Selecting the Type of Data Compression to Use
It's important to know that both the Current space and Requested compressed space values represent the total space used by all indexes, not just the table or index you selected when launching the wizard. This is evident if you run the sp_estimate_data_compression_savings stored procedure, which gives you the estimated savings for the clustered index (IndexID = 1). For example, running the code
EXEC sp_estimate_data_compression_savings 'Production','Product',1,null,'ROW'
gives the results shown in Figure 2.
Figure 2: Examining Sample Results from sp_estimate_data_compression_savings
Notice that this result is different from the result learned using the Data Compression Wizard. That's because the wizard shows the result of compressing all indexes, whereas the sp_estimate_data_compression_savings stored procedure only takes the single index or table that was specified into consideration.
After you decide which type of compression to use, click Next to go to the Select an Output Option page. The Data Compression Wizard creates scripts to compress and decompress data. Figure 3 shows the default scripting options. Leave those options selected and click Next.
Figure 3: Leaving the Output Options at Their Defaults
On the Data Compression Wizard Summary page, which Figure 4 shows, review the selections you made. Assuming everything is okay, click Finish.
Figure 4: Reviewing the Summary Page
The Compression Wizard Progress page tracks the progress of the wizard as it creates the script, as Figure 5 shows. When you receive the Success message, click Close.
Figure 5: Tracking the Progress of the Wizard
After the Data Compression Wizard closes, the generated script will be displayed in a new query window in SSMS. For this example, the script is:
USE [AdventureWorks2012]ALTER INDEX [PK_Product_ProductID] ON [Production].[Product] REBUILD PARTITION = ALLWITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = ROW)
The last step of implementing data compression is to run the generated script.
Automating Data Compression
You can implement an automated data compression process by using the sp_estimate_data_compression_savings stored procedure and T-SQL's ALTER INDEX or ALTER TABLE statements. You also need a proper strategy for determining when to compress data. Fortunately, the Microsoft white paper "Data Compression: Strategy, Capacity Planning and Best Practices" provides an excellent strategy. Basically, it recommends using page compression for large tables and indexes that meet two criteria:
The table's or index's scan operations account for more than 75 percent of its total operations.
The table's or index's update operations account for less than 20 percent of its total operations.
Armed with this strategy, I created a script that performs the following steps:
The script scans all the indexes in the current database, selecting those indexes that meet three criteria:
The index has more than 100 pages.
The index has SELECT operations that account for more than 75 percent the total index operations.
The index has UPDATE operations that account for less than 20 percent of the total index operations.
For each selected index, the script estimates the amount of space that would be saved with row compression as well as the amount of space that would be saved with page compression. Both estimates are presented as percentages.
The script compares the row compression and page compression percentages calculated in step 2, then makes recommendations. It recommends page compression for indexes if they have no UPDATE operations or if their estimated space savings is greater than 10 percent. It recommends row compression for the rest of the indexes.
The script outputs the results in two sections. The first section lists the data compression candidates. The second section lists the recommended compression method and estimated space savings. It also includes the code you can execute if you want to follow the recommendations. The script won't execute the code because it should be verified by someone first, which is a good practice to follow.
The script is too large to include here, but you can download it by clicking the Download the Code button near the top of the page.
To test this script, I executed it in one of my databases, which has more than 1,300 tables. The script finished within two minutes. As Figure 6 shows, it identified eight indexes as potential data compression candidates and gave seven recommendations.
Figure 6: Examining Sample Results from the Script
Why is one index not recommended for data compression even though it seems to be a good candidate? The reason is quite interesting: The sp_estimate_data_compression_savings stored procedure estimates that the data space will expand after applying data compression. Let's look into this situation more closely.
When Data Compression Hurts
Data compression is a tool to improve query performance by reducing physical I/O reads and writes. It can only reduce I/O reads and writes when it saves space, and it usually does so. I say "usually" because, just like other performance tools, it can hurt performance in some cases.
Both row and page compression will consume additional space when they're implemented. This is referred to as the metadata overhead. The final space savings is the difference between the space saved by compressing data and the metadata overhead. Therefore, when the metadata overhead is greater than the space saved, implementing data compression will actually expand the storage space.
To demonstrate the growth of data space after implementing data compression, I engineered the following scenario. First, I created a database and table using the code:
CREATE DATABASE DCTest;GOUSE DCTest;CREATE TABLE table1 (msg char(36));GO
As you can see, this table has only one column whose data type is char. Row compression typically saves data space by eliminating blank spaces. However, in this case, I intentionally left no spaces to remove.
Next, I populated the table with 1 million rows of fixed-length random data:
INSERT INTO table1 SELECT NEWID()GO 1000000
With the table populated, I ran the sp_estimate_data_compression_savings stored procedure twice, once with row compression and once with page compression:
EXEC sp_estimate_data_compression_savings 'dbo','table1',null,null,'row'EXEC sp_estimate_data_compression_savings 'dbo','table1',null,null,'page'
As you can see in the first two result sets in Figure 7, the stored procedure's estimates show space growth.
Figure 7: Exploring When Data Compression Hurts Rather Than Helps
For demonstration purposes, let's ignore the estimates and implement both types of data compression:
--Get a baseline.EXEC sp_spaceused table1--Test row compression.ALTER TABLE table1 REBUILD WITH (data_compression = row)EXEC sp_spaceused table1--Test page compression.ALTER TABLE table1 REBUILD WITH (data_compression = page)EXEC sp_spaceused table1
As you can see in the last three result sets in Figure 7, both the row-compressed and page-compressed tables consume more space than the non-compressed table.
As the last step, I ran some cleanup code:
DROP TABLE table1;USE master;DROP DATABASE DCTest;
Evaluate Your Data Before Compressing It
In many cases, data compression is a very helpful tool, but it has a few limitations. For this reason, you need to evaluate your data before compressing it. To perform this evaluation, you can use the sp_estimate_data_compression_savings stored procedure or the script I provided. After you compress your data, you should monitor it periodically, adjusting your strategy when needed. This will help keep your data in good shape.
*************************************************************************************
Louis Li is a SQL Server consultant, a Microsoft Certified Solutions Master and Microsoft Certified Trainer. He loves data and enjoys reading when he’s not working with SQL Server.
Email: [email protected]
*************************************************************************************
About the Author
You May Also Like