Not All Data Compression Is Created Equal
While we like to say that storage is cheap, the reality is that cheap is relative. If you don’t have budget for more storage, suddenly storage isn’t very cheap. Because of the absolutely massive amount of CPU power that we can pack into a server for relatively little money these days, storage compression is coming more and more the reality for more and more DBAs.
August 23, 2011
While we like to say that storage is cheap, but the reality is that cheap is relative. If you don’t have budget for more storage, suddenly storage isn’t very cheap. Because of the absolutely massive amount of CPU power that we can pack into a server for relatively little money these days (again this is relative, but faster servers are much cheaper than SAN storage upgrades), storage compression is coming more and more the reality for more and more DBAs.
Related: Compression in SQL Server 2008
One big question is should I select ROW or PAGE level compression for my database tables. While I can’t answer that question for you in a simple blog post, as the best way to figure that out is to try one then the other on your dev, test, and QA systems, I can give you a very helpful piece of advice. When considering data compression one very key thing to remember is that out of row data (LOB data) isn’t compressed. If the LOB data is stored in row, then it will be compressed, but only when you compress the table with PAGE level compression. If however you use ROW level compression then the LOB data will never be compressed, no matter if it is stored in row or if it is stored out of row.
We can see this if we do a little work with DBCC PAGE to look into the physical pages on the disk. First we create a table using CREATE TABLE which isn’t compressed and insert a row into the table.
CREATE TABLE dbo.CompressTest1(ID int primary key,StringValue varchar(max))goinsert into CompressTest1select 1, 'Hello World!'go 1000
From where we look into a row on the first page and see that the data isn’t compressed.
Slot 4 Offset 0x0 Length 0 Length (physical) 0KeyHashValue = (59855d342c69) Slot 5 Offset 0x123 Length 39Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNSRecord Size = 39 Memory Dump @0x0000000017B0A1230000000000000000: 30000800 06000000 0200c801 00270048 0065006c †0.........È..'.H.e.l0000000000000014: 006c006f 00200057 006f0072 006c0064 002100††††.l.o. .W.o.r.l.d.!.
Now we can use the ALTER TABLE statement to compress the table, and then look at the contents of the page again (don’t forget that the page number will change when you rebuild the table to a different compression level). First we will use ROW compression. If we scroll down the output from the page the rows all appear a little different but we can clearly see that no compression is being used as we can read the output of the row (I’ve included the output from a couple of the rows below).
Slot 11 Offset 0x1d6 Length 34Record Type = (COMPRESSED) PRIMARY_RECORD Record attributes = LONG DATA REGIONRecord size = 34 CD ArrayCD array entry = Column 1 (cluster 0, CD array offset 0): 0x03 (TWO_BYTE_SHORT)CD array entry = Column 2 (cluster 0, CD array offset 0): 0x0a (LONG) Record Memory Dump000000001192A1D6: 2102a380 f0010100 18004800 65006c00 6c006f00 †!.£.ð.....H.e.l.l.o.000000001192A1EA: 20005700 6f007200 6c006400 2100††††††††††††††† .W.o.r.l.d.!.Record Type = (COMPRESSED) PRIMARY_RECORD Record attributes = LONG DATA REGIONRecord size = 34 CD ArrayCD array entry = Column 1 (cluster 0, CD array offset 0): 0x03 (TWO_BYTE_SHORT)CD array entry = Column 2 (cluster 0, CD array offset 0): 0x0a (LONG) Record Memory Dump000000001192A2C4: 2102a380 f7010100 18004800 65006c00 6c006f00 †!.£.÷.....H.e.l.l.o.000000001192A2D8: 20005700 6f007200 6c006400 2100††††††††††††††† .W.o.r.l.d.!.Slot 27 Offset 0x3f6 Length 34Record Type = (COMPRESSED) PRIMARY_RECORD Record attributes = LONG DATA REGIONRecord size = 34 CD ArrayCD array entry = Column 1 (cluster 0, CD array offset 0): 0x03 (TWO_BYTE_SHORT)CD array entry = Column 2 (cluster 0, CD array offset 0): 0x0a (LONG) Record Memory Dump000000001192A3F6: 2102a381 00010100 18004800 65006c00 6c006f00 †!.£.......H.e.l.l.o.000000001192A40A: 20005700 6f007200 6c006400 2100††††††††††††††† .W.o.r.l.d.!.
Now we rebuild the table again using PAGE compression and look at the contents of the page again. The first row of the page looks pretty normal.
AnchorRecord @0x000000001972A065Record Type = (COMPRESSED) PRIMARY_RECORD Record attributes = LONG DATA REGIONRecord size = 32 CD ArrayCD array entry = Column 1 (cluster 0, CD array offset 0): 0x00 (NULL) CD array entry = Column 2 (cluster 0, CD array offset 0): 0x0a (LONG) Record Memory Dump000000001972A065: 2102a001 01001800 48006500 6c006c00 6f002000 †!. .....H.e.l.l.o. .000000001972A079: 57006f00 72006c00 64002100 †††††††††††††††††††W.o.r.l.d.!.
But as we move down through the page everything else looks a little "different."
Slot 0 Offset 0x85 Length 9Record Type = (COMPRESSED) PRIMARY_RECORD Record size = 9CD ArrayCD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)CD array entry = Column 2 (cluster 0, CD array offset 0): 0x01 (EMPTY) Record Memory Dump000000001972A085: 01021281 20002800 31††††††††††††††††††††††††††.... .(.1
Here’s another row from the page.
Slot 3 Offset 0xa0 Length 9Record Type = (COMPRESSED) PRIMARY_RECORD Record size = 9CD ArrayCD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)CD array entry = Column 2 (cluster 0, CD array offset 0): 0x01 (EMPTY) Record Memory Dump000000001972A0A0: 01021284 0049004e 00††††††††††††††††††††††††††....I.N.
And here’s a third row from the same page.
Slot 5 Offset 0x0 Length 0 Length (physical) 0KeyHashValue = (74a6cc4021ee) Slot 6 Offset 0xbb Length 9Record Type = (COMPRESSED) PRIMARY_RECORD Record size = 9CD ArrayCD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)CD array entry = Column 2 (cluster 0, CD array offset 0): 0x01 (EMPTY) Record Memory Dump000000001972A0BB: 01021287 20002000 20††††††††††††††††††††††††††... . .
As we can see the rows are clearly compressed. Now let’s overflow the values so that the values are stored off page. We’ll leave the table with the same PAGE compression we’ll just truncate it and populate it with a large string. The string that I’m using is simply “Hello World! - “ repeated a few thousand times.
After the script has run and we look at the normal page we can see that the row is in fact compressed (I’m included some header information here that I didn’t include in the captures above).
Slot 0 Offset 0x60 Length 57Record Type = (COMPRESSED) PRIMARY_RECORD Record attributes = LONG DATA REGIONRecord size = 57 CD ArrayCD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)CD array entry = Column 2 (cluster 0, CD array offset 0): 0x0a (LONG) Record Memory Dump0000000017B0A060: 2102a281 01010030 80040000 11010000 005f6500 †!.¢....0........._e.0000000017B0A074: 00681f00 00230100 00010000 00d03e00 00240100 †.h...#.......Ð>..$..0000000017B0A088: 00010000 00f84c00 00210100 00010000 00††††††††.....øL..!.......
When we look in the LOB pages for that row (pages 291, 292 and 289 according to DBCC PAGE on my system) we see that indeed the information stored on the LOB page is not compressed.
Blob row at: Page (1:291) Slot 0 Length: 8054 Type: 3 (DATA)Blob Id:1700724736000000001972A06E: 00650048 006c006c 0020006f 006f0057 H.e.l.l.o. .W.o.000000001972A07E: 006c0072 00210064 002d0020 00480020 r.l.d.!. .-. .H.000000001972A08E: 006c0065 006f006c 00570020 0072006f e.l.l.o. .W.o.r.000000001972A09E: 0064006c 00200021 0020002d 00650048 l.d.!. .-. .H.e.000000001972A0AE: 006c006c 0020006f 006f0057 006c0072 l.l.o. .W.o.r.l.... Snipped for space000000001972BF8E: 0064006c 00200021 0020002d 00650048 l.d.!. .-. .H.e.000000001972BF9E: 006c006c 0020006f 006f0057 006c0072 l.l.o. .W.o.r.l.000000001972BFAE: 00210064 002d0020 00480020 006c0065 d.!. .-. .H.e.l.000000001972BFBE: 006f006c 00570020 0072006f 0064006c l.o. .W.o.r.l.d.000000001972BFCE: 00200021 0020002d !. .-. .
You can also look at the number of pages which are returned by DBCC IND (which is the command that you use to see what pages are used by a specific table) to see this work in action. When my table isn’t compressed the table contains 8 pages (6 are in row data pages). When the table is ROW compressed the table is 7 pages (6 are in row data pages). When the table is PAGE compressed the table is 4 pages (2 are in row data pages).
Another item of importance to note about page level compression is that the page must be full before the page is compressed. This really makes sense as you wouldn’t want to be compressing the data in the page over and over again every time it is written to.
Hopefully through reading this post I’ve shown you the power of compression as well as some potential sticking points if you use LOB values in the tables which you plan on compressing data. Data compression can save you real money on storage when used correctly. And in today’s economy saving real money is a very good thing.
About the Author
You May Also Like