Update the Statistics on Recently Modified Tables Only
Stored procedure saves time and resources
March 25, 2010
I work with multiple customer databases in several SQL Server 2008 and SQL Server 2005 production database servers. The data in our production environment constantly changes. Updating the statistics on the tables where a data modification took place is important because it helps the queries that rely on those tables run faster.
Our databases range in size from 50GB to more than 1TB, so running the UPDATE STATISTICS statement on an entire database is time-consuming. Although some data changes involve many tables (and hence millions or even billions of rows) in a given database, other data changes involve only a handful of tables. Why run UPDATE STATISTICS on all the tables when only a handful of tables had data modifications?
To save time and resources, I wrote a stored procedure, sp_DBARunUpdateStats, that updates statistics only on those tables where the data has been recently changed by INSERT, DELETE, or UPDATE statements. (I also wrote a related stored procedure, sp_DBAGetUpdateStats, that reports on the last time the UPDATE STATISTICS statement was run against all the table indexes in a database. You can read about this stored procedure in the sidebar "Stored Procedure Provides UPDATE STATISTICS Report.") After I show you how to run sp_DBARunUpdateStats, I'll explain how it works.
How to Run sp_DBARunUpdateStats
The sp_DBARunUpdateStats stored procedure is designed to reside in the master database. That way, you can invoke this stored procedure from the Query Analyzer regardless of the database the current session is using. You can also invoke sp_DBARunUpdateStats from a SQL Server Agent job. (If desired, you can store sp_DBARunUpdateStats in another database as long as the call to the stored procedure is fully qualified.)
The stored procedure takes four parameters:
@DBName. You use this mandatory first parameter to specify the name of the database that contains the tables you want to update.
@ModifiedTableOnly. You use this optional second parameter to indicate the scope of the update. You specify 'Y' when you want to update the statistics on only those user tables where the data was recently modified (i.e., modified since the last time UPDATE STATISTICS was run).You specify 'N' when you want to update the statistics on all the user tables. The default value is 'N'.
@RunSPRecompile. You use this optional third parameter to indicate whether you want to run the sp_recompile system stored procedure on the updated tables ('Y') or not ('N'). The default value is 'Y'.
@DisplayOutput. You use this optional fourth parameter to indicate whether you want the output displayed ('Y') or not ('N'). The default value is 'Y'.
If you don't specify an optional parameter, its default value is used. So, for example, if you want to update the statistics on all the tables in the AdventureWorks database, run sp_recompile on those updated tables, and display the output, you'd use the command
EXEC sp_DBARunUpdateStats @DBName = 'AdventureWorks'
If you want to update the statistics on only those AdventureWorks tables that had their data recently modified, run sp_recompile on those updated tables, and display the output, you'd use the command
EXEC sp_DBARunUpdateStats @DBName = 'Adventureworks', @ModifiedTableOnly = 'Y',
If you want to update the statistics on only those AdventureWorks tables that had their data recently modified, but not run sp_recompile or display the output, you'd use the command
EXEC sp_DBARunUpdateStats @DBName = 'Adventureworks', @ModifiedTableOnly = 'Y', @RunSPRecompile = 'N', @DisplayOutput = 'N'
How sp_DBARunUpdateStats Works
Here's the overall process that sp_DBARunUpdateStats uses to update statistics:
The stored procedure creates two temporary tables.
The stored procedure populates the first temporary table with the names of all the tables and their indexes in the specified database.
The stored procedure checks to see whether you specified that you wanted to update the statistics on only those tables where the data was recently modified. If you did, it populates the second temporary table with the names of the tables where the data has changed recently, then deletes any entries from the first temporary table that don't have a match in the second temporary table.
The stored procedure runs the UPDATE STATISTICS statement against each table listed in the first temporary table.
To achieve step 3, sp_DBARunUpdateStats takes advantage of two features that have been available since SQL Server 2005: the STATS_DATE function and the sys.dm_db_index_usage_stats dynamic management view (DMV).
The STATS_DATE Function
To determine the last time UPDATE STATISTICS was run on a table, the stored procedure uses the STATS_DATE function, which returns the date and time of the most recent statistics update for a table. This function's syntax is
STATS_DATE (object_id, stats_id)
where object_id is the ID of the target table and stats_id is the ID of the statistics object. You can obtain the statistics object's ID from the index_id column in SQL Server's sys.indexes catalog view for the specified table.
In sys.indexes, a table will have a row for each of its indexes and a row representing the table itself (assuming it's stored in a heap). The index_id column will contain a value of 0 (index_id = 0) when it represents the table. The index_id column will contain a value greater than 0 (index_id > 0) when it represents an index. So, for example, a table with three indexes will have four rows: one row that reads index_id = 0 and three rows that read index_id > 0.
For each index_id > 0, you can have the STATS_DATE function get the date and time when the last UPDATE STATISTICS was run for that particular index. Therefore, if you run a statement such as
'SELECT STATS_DATE (object_id('dbo.MyTable'), 1)'
the returned datetime value (e.g., 2009-08-16 16:46:29.197) represents the last time UPDATE STATISTICS was run on that table.
The sys.dm_db_index_usage_stats DMV
The sys.dm_db_index_usage_stats DMV contains index usage information for user tables. To access this information for all tables and their indexes in a database, you'd use a query such as
SELECT * FROM sys.dm_db_index_usage_stats WHERE database_id = db_id('MyDatabaseName')
The result set would include counts of different types of index operations and the time each type of operation was last performed. In other words, the result set would indicate how many times a particular index was used by queries. It would also indicate the last time an index was modified by an INSERT, DELETE, or UPDATE statement. (Note that the DMV statistics might not be complete because they can be lost when SQL Server stops or when other events occur, such as the dropping and re-creation of an object.)
The DMV's last_user_update column specifies when (date and time) the last data modification took place on an index. So, you can narrow the query by specifying that column with code such as
SELECT last_user_update FROM sys.dm_db_index_usage_stats WHERE database_id = db_id('MyDatabaseName')
If you want to check this information for a specific table, you can further refine the query by adding the table's ID, as in
SELECT last_user_update FROM sys.dm_db_index_usage_stats WHERE database_id = db_id('MyDatabaseName') AND object_id = object_id('dbo.MyTable')
The returned datetime value represents the last time that table's index was modified.
How sp_DBARunUpdateStats Uses the Features
Here's how sp_DBARunUpdateStats uses the STATS_DATE function and the sys.dm_db_index_usage_stats DMV. For every table index, sp_DBARunUpdateStats first uses the STATS_DATE function to find out the last time the UPDATE STATISTICS statement was run. Next, the stored procedure accesses the DMV's last_user_update column to find out the last time that index was modified.
The stored procedure then compares the two retrieved datetime values. If the datetime value returned by the STATS_DATE function is less than the datetime value found in the last_user_update column, the index needs to have its statistics updated. If that isn't the case, the statistics don't need to be updated for that index because no data has changed since the last time UPDATE STATISTICS was run.
This process not only determines whether UPDATE STATISTICS needs to be run but also has an added bonus: The stored procedure qualifies those tables and their indexes to which the data is loaded for the very first time.
Give It a Try
If you'd like to try sp_DBARunUpdateStats, you can download it by clicking the Download the Code Here button near the top of the page. This stored procedure works on SQL Server 2008 and SQL Server 2005. On tables that have 100 million or more rows, the stored procedure will sample only 10 million rows to see whether UPDATE STATISTICS has been run recently. Tables that have less than 100 million rows, the stored procedure runs update statistics without using any sample option.
About the Author
You May Also Like