Optimize the Size of Your Indexed Integer Columns
Early assumptions can prove wrong
April 9, 2006
When you set up databases, you often have to choose the columns to be indexed, the order of the columns inside the index, and the data type of each indexed column. I've found that columns containing integers most often participate in indexes. With integers, you need to specify the amount of space reserved for those values by specifying the data type of bigint (8 bytes), int (4 bytes), smallint (2 bytes), or tinyint (1 byte). Optimally, you should select a data type that takes up the least amount of space yet is suitable to hold all possible values. That way, the index is smaller and more efficient. However, as time goes by, the early assumptions about the data types of the indexed integer columns might prove wrong. For that reason, I designed a process that searches for indexed integer columns in a database and determines whether the optimal data type is used for those columns. The process I designed uses two procedures: suggestIntType and scanIntIndex.
The suggestIntType Procedure
As Listing 1 shows, suggestIntType is a simple T-SQL procedure. It needs only two input parameters: the table name (sysname) and the column name (sysname).
The suggestIntType procedure generates three parameters as output: isIndexed, isOptimized, and optType. The isIndexed parameter is a bit that's set to 1 when the column is indexed and 0 when the column isn't indexed. The isOptimized parameter is a bit that's set to 1 when the procedure finds a better data type for the column and 0 when the procedure doesn't find a better data type. When the isOptimized parameter is 1, the optType parameter is set to the suggested data type. When the isOptimized parameter is 0, the optType value is set to the column's current data type.
The suggestIntType procedure determines whether there's a better data type by checking the indexed integer column's maximum value (i.e., the largest integer in the column) against the following criteria:
If the maximum value is greater than 2^32, the bigint type is suitable.
If the maximum value is greater than 2^16 but less that 2^32, the int type is suitable.
If the maximum value is greater than 2^8 but less than 2^16, the smallint type is suitable.
If the maximum value is less than 2^8, tinyint type is suitable.
Let's look at how you might use the suggestIntType procedure by itself. Suppose you want to check whether the data type for the ProductId column in the Products table in the Northwind database can be optimized. If you run the code
DECLARE @isIndexed bit DECLARE @isOptimized bit DECLARE @type sysname EXEC dbo. suggestIntType 'products', 'productId', @isIndexed output, @isOptimized output, @type output SELECT @isIndexed, @isOptimized, @type
the suggestIntType procedure returns
1 1 tinyint
which means that ProductId is an indexed integer column and there's a more suitable data type than the data type currently being used. The more suitable integer type is tinyint.
Although this information is useful, it would be more helpful to have all the columns scanned and have the information recorded in a user-friendly format for the administrator. That's where the scanIntIndex procedure comes into play.
The scanIntIndex Procedure
Listing 2 shows the scanIntIndex procedure. As you can see, it doesn't have any parameters.
The scanIntIndex procedure scans the INFORMATION_SCHEMA.columns view for all indexed integer columns. It uses the suggestIntType procedure to determine the data type information for those columns, as callout A in Listing 2 shows. The scanIntIndex procedure then creates a temporary table that contains those integer columns that use an index and are candidates for optimization. The administrator can then decide whether those columns should be altered in order to optimize the index size.
To use the scanIntIndex procedure, you simply use code such as
USE northwind GO
followed by the scanIntIndex procedure's code.
Give Them a Try
Now that you know how the scanIntIndex and suggestIntType procedures work, give them a try. The procedures should be executed by the database owner (DBO) user or a user that has the db_owner database role or sysAdmin server role. The procedures should be created inside that user's database. The procedures assume that the tables being scanned are owned by the DBO schema. I wrote and tested the procedures on a Windows XP machine running SQL Server 2000 Personal Edition Service Pack 1 (SP1) and SQL Server 2005 Express Edition.
About the Author
You May Also Like