10 More Performance-Enhancing Ideas for SQL Server
Make your powerful, reliable database applications fast, too.
March 31, 1997
COAX EVEN MORE SPEEDOUT OF YOUR SQL SERVER DATABASES
Microsoft's SQL Server lets you build powerful and reliable databaseapplications quickly, but making those applications perform their best can betricky. Luckily, database administrators and developers can use several simpletechniques to coax more speed out of a SQL Server database. In October 1996 ("10Easy Tips for Better SQL Server Performance"), I discussed someperformance-boosting tactics. Here are 10 more ideas that you can easily applyto your SQL Server database applications.
Tip 1:
Use Stored Procedures Where Possible
You can use procedural language (Transact-SQL) and SQL to create functionsthat are stored in the database engine, instead of in application code orlibraries. Such stored procedures have several advantages. These procedureseliminate runtime parsing, because SQL parses them when you create them.You can designate certain stored procedures to run with databaseadministrator (DBA) privileges, even if the user running them does not have thislevel of security. This feature lets you combine a high level of data accesswith tight security. With stored procedures, you can easily create libraries offunctions, thereby reducing the amount of original code that your programmersmust write. Stored procedures greatly reduce the amount of work needed toperform an upgrade because you can move application logic onto the serverinstead of distributing new versions of software to each client in anorganization. When you store procedures, the SQL Server engine can read theprocedures from the memory buffer instead of from the disk, thus reducing theoverall amount of costly disk I/O operations. Finally, in a distributedenvironment, stored procedures let you cut the amount of information travelingbetween the front end (client) and the back end (server). This reduction cansave time, especially if client and server are far apart. Another way to reducetraffic between the client and server during stored procedures is to set theNOCOUNT option. NOCOUNT disables SQL Server's DONE_IN_PROC messages, which showthe number of rows a particular operation affects.
Tip 2:
Select the Best Read-Ahead Configuration Values
One SQL Server feature that can dramatically improve query performance isread-ahead, (RA, or parallel data scan). When SQL Server detects certainqueries, such as table scans and other searches that return large quantities ofdata sequentially, it allocates a background thread to read ahead in the table.The result is that by the time your program requests this information, SQLServer might have already fetched the data into the buffer pool.
For example, suppose you're running a long report that pulls informationfrom a large customer table. If you're reading large blocks of datasequentially, SQL Server can anticipate the next series of information that youwant and read these rows into memory while you're still processing the firstbatch of data. This action can lead to substantial performance gains, becauseyour program may now be able to find what it needs in memory, rather than ondisk.
Let's look at how you can set the parameters listed in the SQL ServerConfiguration/Options dialog box to best take advantage of RA. Don't forget thatchanges to any RA parameters will affect database access for all SQLServer-based applications running on your system. Therefore, make changes tothese parameters carefully, because an alteration can produce unintendedresults.
RA cache hit limit. Sometimes the RA managerattempts to locate data from the disk but finds the data still in the bufferpool. In this case, RA is not very effective. You can set the hit-limitparameter to restrict the number of buffer pool hits that an RA query encountersbefore the manager abandons the RA strategy. The valid range of values isbetween 1 and 255 hits; the default value is 4. Do not set the value excessivelyhigh; if the RA manager finds pages already in the buffer pool, the manager hasno reason to continue reading ahead.
RA cache miss limit. SQL Server uses the RA cachemiss limit to determine when to start reading ahead. For example, if you've setthe RA cache miss limit to 5, SQL Server starts reading ahead after not findingfive pages in the buffer pool. The valid range of values is between 1 and 255;the default is 3.
Setting this value too low means that SQL Server tries to read ahead onmost queries; setting the value too high causes SQL Server to avoid apotentially beneficial strategy. So, if you use your system primarily forreporting and other operations that usually fetch large batches of information,set the value on the low side.
Setting it to 1, however, means that SQL Server will always issue an RArequest, even when retrieving only one data page from disk. This process willnegatively affect performance in most cases. This setting tells SQL Server thatyou usually want to start RA operations as quickly as possible. Conversely, ifyour system works as an OnLine Transaction Processing (OLTP) environment, withvery few batch, sequential operations, raise this value because you want SQLServer to avoid RA in all but the most obvious situations.
RA delay. SQL Server uses the RA delay parameter todetermine how long to wait before starting to read ahead. This value isnecessary because some time always elapses between when the RA manager startsand when it can service requests. The valid range of values is between 0 and 500milliseconds; the default is 15. The default suffices for most environments, butif you're running SQL Server on a multiprocessor machine, set it lower. Settingthis parameter too high means that SQL Server can delay too long beforeembarking on an RA.
RA pre-fetches. You can use the RA pre-fetches measure totell SQL Server how many extents you want it to pre-fetch during RA operations.The valid range of values is between 1 and 1000, with a default value of 3. Ifyour applications perform primarily large sequential operations, set this valueto a higher number to tell SQL Server to bring larger amounts of data into thebuffer pool during each RA operation. If you set this number too high, however,you can displace other users' buffer pool pages with your data. Consequently, becareful when you experiment with this number; increase the value gradually. Tryraising the value by 5 percent each time, and keep a record of overall systemresponse between changes. Find out whether performance gains for one applicationdegrade the performance of other applications.
RA worker threads. Threads process RA operations.The RA-slots-per-thread parameter controls the number of threads that SQL Serverallocates to service RA requests. Each configured thread then supports a numberof individual RA requests. The RA worker thread setting can range between 0 and255; the default is 3. Set this option to the maximum number of concurrent usersthat you expect to access SQL Server. If you set this parameter too low, youmight not have enough threads to service the volume of RA requests. If you setit too high, you'll start too many RA threads. SQL Server logs an error if thenumber of RA worker threads exceeds the number of RA slots.
RA slots per thread. The RA-slots-per-threadparameter configures the number of RA requests that each thread manages. Thevalid range is between 1 and 255 operations; the default is 5. If you set thisvalue too high, SQL can overload your RA threads; the thread can spend more timeswitching between different RA requests than servicing the requests. A low valuecan lead to idle threads. Usually, the default is fine.
One final note regarding tuning RA parameters. Do not experiment with thesenumbers until you are comfortable with both the architecture of SQL Server andthe specific performance characteristics of your system. Even when you decide toexperiment, remember to change only one parameter at a time. Changing multipleparameters at once degrades performance without giving you much informationabout why response has become worse.
Tip 3:
Use Wide Indexes Correctly
A wide index is one that contains a large amount of data--either many smallcolumns or several big columns. For example, an index of one char(255) column iswider than an index of five char(4) columns.
Narrow indexes are generally better. When you create an index, SQL Serverstores the index's key values and record locations on index pages. With narrowindexes, SQL Server can fit more key values and pointers on each index page.This structure lets the optimizer find your data more quickly, because it has toread fewer index pages before encountering your data. In addition, when moreindex keys and pointers are on a page, the optimizer can churn through thisinformation more effectively. Conversely, when your index keys are wide, theengine can fit only a few key value and data pointers on each page. The indexstructure also tends to be deeper when keys are wide, so the optimizer has tocarry out even more computations.
In the Customer_master table in Table 1, suppose you place a compositeindex on the last_name, first_name, street, and city columns. This index iswide, because it contains a relatively high number of columns and a great dealof data.
TABLE 1: Customer_master
account_number | last_name | first_name | street | city | ...account_balance |
---|---|---|---|---|---|
344484454 | Bockwinkle | Terry | Jeeves Way | 24998 | |
344484455 | Okerlund | Nick | Jacques St. | 105660 | |
344484456 | Blassie | George | Mariner Rd. | 3004 | |
. | . | . | . | . | . |
. | . | . | . | . | . |
. | . | . | . | . | . |
You need to analyze why you create wide indexes. Do the users want tosearch on all these fields together; do they want to sort on all these fields?More likely, this index represents a kitchen sink approach, whereby you put asmany columns in the index as possible.
This approach isn't as effective as you might think. The performanceobstacle becomes painfully obvious when you ask the optimizer to search on onlyone or two of the index's columns. In Table 1, if you want to find all rows thathave a last_name value between Hank and Hendrix, the optimizerhas to use the composite index because no other indexes exist. Unfortunately,the optimizer now needs to read potentially hundreds of index pages to locatethe appropriate information because the index's width means that the index isalso deep. However, if you have one index on last_name, you can fit more keys oneach index page, and the optimizer can find the correct information morequickly. A wide index also can cause your sort order to force the engine into asequential scan, in spite of the index's existence, because of the way SQLServer uses a composite index to satisfy searches and sorts.
Despite these warnings, wide indexes are effective if you request onlycolumns from a particular nonclustered index. (In a nonclustered index, theengine does not reorder the table data to match the index.) SQL Server canbypass reading the data and get its information from the index keys. Forexample, the query
select last_name, first_name,
from customer_master
where last_name between Zlotnick and Zotnick
asks for only the last_name and first_
name columns from the customer_
master table. SQL Server can satisfy this request without reading the table'sdata pages, because both columns are in the index and the index is nonclustered.If you retrieve large blocks of data, this shortcut can improve performance,especially because disk I/O can add a great deal of overhead to a query. So,wide indexes are not always performance-wreckers.
Tip 4:
Determine the Right Size for the Transaction Log
To correctly size your system's transaction log, start by allocating about15 percent to 25 percent of your total database disk storage to the transactionlog. Then consider some factors that affect usage of the transaction log.
An application that performs primarily read-only data access is not likelyto require a large transaction log, because the application uses the log onlywhen it modifies information. Conversely, if your application performs millionsof modifications each day, you can count on needing a larger transaction log.However, if your application performs frequent but small data modifications, youmight not need an enormous transaction log.
The recovery interval parameter helps control how often checkpoints occur.During a checkpoint, SQL Server synchronizes the contents of the transaction logand the disk. In most cases, the longer the interval between checkpoints, thelarger the transaction log you need.
If you infrequently save the transaction log to media, instead ofautomatically truncating the log, you need to create a larger transaction log.Saving the log frequently is a tactic that lets you avoid creating a largetransaction log. Also, remember that because you can't restore transactionsafter you truncate the log, automatically truncating the transaction log meansthat you're willing to lose transactions between data backups if your systemfails.
If you use the CREATE DATABASE statement, specify log size with thefollowing syntax:
[LOG ON database_device
[= size]
[, database_device
[= size]]...]
If you use the SQL Enterprise Manager, specify log size with the dialog boxshown in Screen 1.
If the data and the transaction log are on the same database device, youcan leave the transaction log size value blank, because the transaction log willconsume only as much space as necessary. However, for most production systems,place the transaction log on a separate device. For this example, assume thatyou use different database devices to hold your data and transaction log.
Start with a small transaction log. You can easily add space if you findthat you've made the transaction log too small, but you can't make the logsmaller without a great deal of work.
You can increase the size in two ways: First, you can use the SQLEnterprise Manager to expand the database screen. This screen lets you pick apreallocated device and then assign extra storage to the transaction log.Theother way to increase the size of the transaction log is to use the ALTERDATABASE command and sp_logdevice stored procedure. For example, if you want toadd 50MB to a database called acquisition, the syntax is
alter database acquisition on logdevice5 = 50
sp_logdevice acquisition, logdevice5
If you increase the transaction log's size, back up the master databasebefore and after the alteration.
Tip 5:
Put TEMPDB in RAM
In certain circumstances, you can improve system throughput by storing theTEMP database (TEMPDB) in RAM. This database is where SQL Server buildstemporary tables and performs much of its internal sorting.
First, this technique is appropriate only when your system has enoughmemory to meet SQL Server's basic caching needs. If your system doesn't haveenough memory to begin with, taking more for TEMPDB can decrease overallperformance. Another time when having TEMPDB in RAM is beneficial is when yourTEMPDB operations fit into the space you've allocated. For example, if you'veallocated 2MB for TEMPDB RAM and each instance of your application routinelycreates 10MB work tables, having TEMPDB in RAM won't make much difference,because you won't have enough space to satisfy all user demands.
Placing TEMPDB in RAM improves performance when your users and applicationsuse TEMPDB heavily. If you don't access TEMPDB often, placing it into RAM mayworsen performance, because TEMPDB now occupies valuable RAM storage.
You can tell how heavily you're using TEMPDB by running the SHOWPLANcommand against your queries. If you see frequent implicit work tables, chancesare you're hitting TEMPDB quite often. But if most of your queries don't requirethe engine to create work tables, storing TEMPDB in RAM probably isn'tnecessary.
Another circumstance in which placing TEMPDB in RAM can help performance iswhen your applications don't access cached data frequently; in these cases, yourprograms are continually accessing the disk drive to locate data, rather thanfinding it in memory. For example, in applications where individual users arelooking at vastly different data records, one user has little chance of findingdata that another user has already cached.
All in all, using RAM for caching data and index pages is probably betterthan placing TEMPDB in RAM. However, if you decide to place TEMPDB in RAM,record your performance before and after you put TEMPDB in RAM; if you don'texperience significantly better throughput, keep your TEMPDB on disk.
To get the best performance from storing TEMPDB in RAM, restart the engineafter changing the storage. If you change it while the engine is running, SQLServer may use noncontiguous memory to satisfy your request. When you restartthe engine, TEMPDB's memory will be coterminous. Via Perfmon, SQL Server 6.5 nowlets you track the maximum amount of used storage in your TEMPDB.
Tip 6:
Avoid Transactions for Certain Operations
Use transactions only if your program modifies the database. Don't usetransactions for queries, reports, work tables, or bulk operations.
If you're running a query and it doesn't have any INSERT/UPDATE/
DELETE statements, you can omit transactions, either by selecting this optionfrom your application development tool or by not issuing a BEGIN TRANSACTIONstatement. Many programmers open a transaction at the start of a report, butrunning a report inside a transaction doesn't contribute anything to the report(unless you are also updating tables). An open transaction during a report caneven slow your system's performance.
Sometimes you need to create temporary work tables, but the traditionalconcepts of database integrity and transactional control might not apply whenyou're using work tables. Thus, you can often avoid using transactions whenpopulating your work tables or modifying information.
Bulk operations are procedures that make substantial changes to databasetables. Often, you don't need to record these events in the transaction log. Youcan disable transactions for bulk operations by setting a database configurationflag that bypasses transactions during bulk inserts and SELECT INTO statements.Or you can modify your application code to omit transactions when you perform abulk operation.
You sometimes have to use transactions, however, to avoid unwanted datamodifications during an operation. For example, during a long query or report,you often want to freeze the underlying data until the report has finished. Todo so, you may need to request that the optimizer lock all rows until theoperation has finished; locking rows requires that you begin a transaction.
Tip 7:
Allocate the Correct Amount of Memory for StoredProcedure Caching
Memory caching is a key component of SQL Server's architecture. The memorycache is divided between storage for data (the data cache) and storage forstored procedures (the stored procedure buffer). Just as SQL Server uses thedata cache to reduce the amount of disk I/O necessary to retrieve data, SQLServer uses the stored procedure buffer to locate procedures in memory, ratherthan reading them from disk.
When you try to execute a previously built stored procedure, SQL Serverfirst looks in the procedure cache to see whether the procedure is already inmemory. If it is, the engine uses the memory-based version of the storedprocedure. If not, the engine reads the procedure from disk and places it in thestored procedure buffer, consuming as many 2KB memory pages as necessary. Whenyou create or compile a stored procedure, SQL Server also uses the storedprocedure buffer to cache this information for later users. However, the enginedoesn't support multiple users working with the same query plan at the sametime; therefore, stored procedures are reusable, not re-entrant.
You allocate the total amount of memory for the engine by setting theMemory parameter. After SQL Server has started and defined all necessaryinternal memory structures, it assigns leftover memory to the stored procedurecache and the data cache.
The Procedure Cache parameter tells the engine what percentage of thisleftover memory to assign to the procedure cache; the remainder goes to the datacache. The default value for the Procedure Cache parameter is 30 percent. Youcan raise or lower this value according to how much you think your applicationwill use stored procedures.
You can use the statistics from the SQL Server Procedure Cache object tomonitor stored procedure cache activity. Pay particular attention to the maximumvalue parameters, such as
Maximum procedure buffers active percentage
Maximum procedure buffers used percentage
Maximum procedure cache active percentage
Maximum procedure cache used percentage
These values represent the high-water marks since you last started theengine.
Tip 8:
Use Joins Instead of Sequential SELECT Statements
Many programmers, especially those with backgrounds in mainframe COBOL,dBASE, or indexed sequential access method (ISAM) files, avoid joining betweentables and instead process database records sequentially. This approach doesn'tuse the full power of the database and the SQL programming language.
For example, you are developing a system to track sales calls by yourcompany's representatives, and you want to produce a report that shows all salescalls an employee has made between January 1 and December 31, 1997. To write thereport, you work with two tables, Personnel (Table 2) and Sales_calls (Table 3).The Personnel table contains information about each representative, and theSales_calls table tracks individual sales call statistics. Your output from thereport must look like this:
Employee number: nnnnn
Employee name : last, first
Sales call date Company name...
--------------------------------
mm/dd/yyyy xxxxxxxxxxxxxx
mm/dd/yyyy xxxxxxxxxxxxxx
If you process the report without joins, your pseudocode looks like
declare cursor to get records from personnel table
open cursor
foreach cursor into host variable
structure (personnel)
print employee name and number
and header info
declare cursor to get records
from sales_calls table
open cursor using the contents
of personnel host variable as key into sales_calls table
foreach cursor into host
variable structure_
(sales_calls)
print sales call information
end foreach
close cursor
end foreach
close cursor
This process contains extra engine work. For every employee found in thePersonnel table, you declare, open, and close a cursor to get data from theSales_calls table. If you have thousands of employees, a better approach is tolet the database engine join records between the Personnel and Sales_callstables. You can do this by using the personnel_id column as a foreign key. Yournew pseudocode looks like this:
declare a cursor to get records from personnel and sales_calls tables
open cursor
foreach cursor into host variable structure (personnel + sales_calls)
print employee name, number
and header info_
(only once)
print sales call information
end foreach
close cursor
Alternatively, you can use a server-based stored procedure and return onlyappropriate rows to the client.
If you want the report to include employees who made no sales calls duringthis period, use a SQL join-based method. Instead of performing an ordinary joinbetween the tables, use an OUTER join, with the Personnel table as primary andthe Sales_calls table as outer.
select *
frompersonnel,outersales_calls
where personnel.personnel_id = sales_calls.personnel_id
By joining between tables, you take better advantage of database engineperformance features.
Tip 9:
Don't Create Too Many Indexes
Some database administrators try to anticipate every possible sort andsearch combination by creating indexes on practically every column in everytable. Having too many indexes impedes your system in many ways. Every time youperform an insert or delete, you have to modify the indexes and the data. Whenyou update an indexed column, the SQL Server engine updates all affectedindexes, an action that can have the added undesirable effect of causing theengine to restructure the index trees. This update operation can impedeperformance for all applications accessing the table and can even brieflydegrade response across the entire system. You have no way of knowing whetherthe engine has restructured the index trees. Extra indexes also consume moredisk space. Finally, when confronted with too many indexes, the optimizer maynot choose the best-qualified index. Your database operation may run more slowlythan if you had fewer indexes.
The best way to know whether you have too many indexes is to test yourdatabase operations with SHOWPLAN. Simulate a typical work day, remove theSHOWPLAN command from any procedures or code that you modified, and then reviewthe output. You can quickly determine which index SQL Server is using and thenremove any indexes that the engine doesn't reference often.
Sometimes you need additional indexes to handle specific, easilyidentifiable tasks, such as an end-of-month processing suite. In these cases,create the indexes immediately before you need them and drop them as soon as youfinish. At other times, you need to run large batch update operations, which canbe time-consuming if you have too many indexes to update. You can benefit fromcreating a stored procedure to drop some indexes, perform the operation, andthen re-create the indexes. The overall time to do this can be less than if youlet the batch update operation alter the extra indexes.
Tip 10:
Use the Multiple Table DELETE Option
Traditional SQL limits your delete operations to one table at a time.Transact-SQL has a multi-table delete capability that reduces the number ofindividual engine calls. For example, to delete rows in two tables, resourcesand parts, you can issue two SQL statements:
delete from resources where resource_cost > 5000
delete from parts where part_cost > 5000
Or you can use Transact-SQL's multiple table DELETE extension:
delete from resources
from parts
where resources.resource_cost = parts.part_cost
and resources.resource_cost > 5000
This approach is not portable, so you can't run your application againstdifferent databases. But if you work only with SQL Server, multi-table is aconvenient shortcut. You can also use the UPDATE statement to alter severaltables at one time.
A Last Word
As you experiment with these tips, change only one parameter at a time soyou know which change produces which effect. And be sure you back up yourdatabase before and after each change.
For more information about these tips and other suggestions for enhancingSQL Server's performance, read my book, Microsoft SQL Server: Planning andBuilding a High-Performance Database.
Microsoft SQL Server : Planningand Building a High-Performance Database
Author: Robert D. Schneider
Publisher: Prentice Hall PTR
Upper Saddle River, N.J., 1997
About the Author
You May Also Like