Time to Defrag Your Indexes?
There's more to consider than you might think
September 27, 2009
A question I hear regularly from the SQL Server user community is, “How do I know when it's time to defrag my indexes?” Most semi-experienced SQL Server DBAs use the avg_fragmentation_in_percent column of the dynamic management function (DMF) called sys.dm_db_index_physical_stats to determine when an index should be completely rebuilt (ALTER INDEX REBUILD), merely defragged (ALTER INDEX REORG), or left alone. This column reveals the average fragmentation of a given index—the higher the value, the worse the fragmentation level is.
Related: To Defrag or Not To Defrag?
But how much fragmentation is too much? Is it 20 percent, or maybe 30 percent? Even more? Unfortunately, there's no magic number, and average fragmentation percentage isn't the only factor to consider when determining whether you need to rebuild or reorganize an index. You also need to consider the fullness of the pages, as reported in the avg_page_space_used_in_percent column of the same DMF. There's more to consider than you might think, so in this column I'll discuss some of those considerations, and I'll provide an example and a helpful piece of code that illuminates the effect of a fragmented index.
Be Logical About It
First, you need to understand two key metrics. The fragmentation reported by the avg_fragmentation_in_percent column shows when the pointer at the leaf (i.e., bottom-most) level of an index points to a page that's logically next in line based on the order of the index expression but isn't physically next to it within the data file. This situation might occur when a row has been inserted into a full page, resulting in a page split. Such a split forces half the rows to be moved to a new page that most likely isn't physically next to the page that split. The avg_page_space_used_in_percent column simply reports how full (on average) the pages in the index are.
What’s the Big Deal?
Now that you know what these metrics mean, you're good to go, right? Unfortunately, it's not that simple. Yes, in general, the more fragmentation or unused space you have on a page, the more likely the system's performance will suffer. The real question is, "How much is performance really affected?" In reality, 20 percent fragmentation might affect performance dramatically or negligibly, depending on several factors—including how much data you have and how (and how often) you access that data. So, there's no magic number or threshold. Every situation is unique, so it's crucial that you understand your system's data-access patterns so that you can make the right index-maintenance decisions. This is especially true for systems with limited maintenance windows and high performance demands.
Let’s see how out-of-order pages (fragmentation) can affect performance. First, keep in mind that this type of fragmentation has much more effect when reading and writing from the physical disk than it does with data that's in SQL Server’s memory cache; disk access is inherently slower due to physical movement of the disk drive heads. For example, if you need to read a range of data from disk, and the pages aren't physically together and in the correct order, the disk heads would have to constantly move from sector to sector for each fragment instead of gliding along in a contiguous movement. Each movement of the head costs time and thus affects performance.
Another factor in a fragmentation scenario is that you'll read more pages than you would in a non-fragmented scenario, simply because the data is now dispersed more. The result is more physical and logical I/O and the utilization of more CPU cycles. (An example later in the article clarifies this point further.) If your fragmentation results from page splits, you typically end up with many pages that are less full than if they didn't split. This is where the Page Free Space metric comes into play.
Imagine a situation in which you needed to read 1,000 rows, and they were spread over 20 pages that were 100 percent full. If those same pages became fragmented, and now the 1,000 rows were spread over 40 pages that were only 50 percent full, you'd have to do twice the amount of work to read the same amount of data. So, the goal is to keep the pages as full as possible to reduce the reads—yet to leave enough room for inserts and updates to minimize page splits. (This is where a proper Fill Factor helps, but that's a topic for another day.)
Where’s the Catch?
Both types of fragmentation that I've discussed have a far greater impact on performance when the indexes are used as part of a scanning process than if they're used for individual seeks. If you look up any given single row via an index, it will take approximately the same amount of CPU and I/O regardless of whether the index is fragmented. You need to read only a small number of pages for a seek, and it will be the same for any given row. Most well tuned online transaction processing (OLTP) queries work this way. However, both types of fragmentation can significantly hinder a full or partial index scan. For clustered indexes, a partial or range scan is common for queries that search for a range of values in the WHERE clause. In this case, the extra reads due to the out-of-order and less full pages can add up quickly, as the following example shows.
Example
For my example, I've built a table with 10,000 rows with about 12 rows to a page. It has a clustered index on an integer column, with values ranging from 1 to 10,000, and a non-clustered primary key (PK) on the integer Identity column.
If you run the code in Listing 1 (below) down to step 1, you can see that it takes 21 reads to read the range of values between 499 and 700 and three reads for the singular value of 500, with the clustered index having 834 total pages. Now, run to step 2, and you can see that a single-row insert caused a page split, as witnessed by the total pages increasing by one, and another read for the range scan has been added. Next, 10 more rows are inserted, increasing the page count by 10. But the range scan reads have increased to 30, a difference of over 30 percent, yet the singular query stayed at three reads.
You can see how just a few changes can affect partial scans in a dramatic way while leaving the seek virtually unaffected. If you try this with a Fill Factor of 80 percent for the clustered index, you can virtually eliminate the page splits, but you get 1,000 total pages instead of the original 834. The result is increased sizes of indexes on disk and less fitting into cache.
It's Up to You
It's up to you to decide when the extra CPU and I/O caused by fragmentation become problematic in your application. Usually, increased CPU consumption and longer durations are the first signs of trouble. Larger indexes generally fragment faster, and indexes smaller than several hundred pages can often be ignored. Due in part to their increased size, clustered indexes might need more attention than non-clustered ones, too. But usage is also a key factor in deciding when to deal with fragmentation.
Listing 1: Fragmentation Example
USE tempdbSET NOCOUNT ONGO-- Create a table for testingIF OBJECT_ID('dbo.TestFrag') IS NOT NULLDROP TABLE dbo.TestFrag ;GOCREATE TABLE dbo.TestFrag(R_ID int NOT NULL IDENTITY (1, 1),CL_Col int NULL, Filler varchar(8000) NULL) ON \[PRIMARY\]GO-- Populate the tableINSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (0,REPLICATE('x',625)) ;GO 10000-- Set the values in the column in which we will build the Clustered IndexUPDATE dbo.TestFrag SET CL_Col = R_ID-- Create the clustered indexCREATE CLUSTERED INDEX IX_TestFrag_CL_Col ON dbo.TestFrag(CL_Col) WITH (FILLFACTOR = 80) ON \[PRIMARY\]GO-- Create the PK as a non-clustered indexALTER TABLE dbo.TestFrag ADD CONSTRAINTPK_TestFrag PRIMARY KEY NONCLUSTERED(R_ID) ON \[PRIMARY\]GO-- Look at the fragmentation level before any changesSELECT index_type_desc, record_count, page_count, avg_fragmentation_in_percent, avg_page_space_used_in_percent, fragment_count, avg_fragment_size_in_pagesFROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.TestFrag'),NULL,NULL,'DETAILED')WHERE index_level = 0SET STATISTICS IO ONSELECT COUNT(*) FROM dbo.TestFrag WHERE CL_Col BETWEEN 499 AND 700 ;SELECT COUNT(*) FROM dbo.TestFrag WHERE CL_Col = 500 ;SET STATISTICS IO OFF-- Step 1-- Add a single row with a value that is not at the beginning or the endINSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (500,REPLICATE('x',625)) ;-- Look at the fragmentation level after the changesSELECT index_type_desc, record_count, page_count, avg_fragmentation_in_percent, avg_page_space_used_in_percent, fragment_count, avg_fragment_size_in_pagesFROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.TestFrag'),NULL,NULL,'DETAILED')WHERE index_level = 0SET STATISTICS IO ONSELECT COUNT(*) FROM dbo.TestFrag WHERE CL_Col BETWEEN 499 AND 700 ;SELECT COUNT(*) FROM dbo.TestFrag WHERE CL_Col = 500 ;SET STATISTICS IO OFF-- Step 2-- Add 10 more rows. Each row added will cause a page splitINSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (520,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (540,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (560,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (580,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (600,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (620,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (640,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (660,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (680,REPLICATE('x',625)) ;INSERT INTO dbo.TestFrag (\[CL_Col\],\[Filler\]) VALUES (700,REPLICATE('x',625)) ;-- Look at the fragmentation level after the changesSELECT index_type_desc, record_count, page_count, avg_fragmentation_in_percent, avg_page_space_used_in_percent, fragment_count, avg_fragment_size_in_pagesFROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.TestFrag'),NULL,NULL,'DETAILED')WHERE index_level = 0SET STATISTICS IO ONSELECT COUNT(*) FROM dbo.TestFrag WHERE CL_Col BETWEEN 500 AND 700 ;SELECT COUNT(*) FROM dbo.TestFrag WHERE CL_Col = 500 ;SET STATISTICS IO OFF
About the Author
You May Also Like