10 Easy Tips for Better SQL Server Performance
Accelerate your SQL Server database applications.
September 30, 1996
MICROSOFT SQL SERVER lets you quickly build powerful and reliabledatabase applications, but making those apps perform their best is tricky.Luckily, database administrators and developers can use several simpletechniques to coax more speed out of a SQL Server database. Here are 10 suchperformance-boosting tactics you can quickly and easily apply to your SQL Serverdatabase applications.
Tip 1: Assign Data Type Numeric Columns
Database administrators and developers oftenassign the CHAR (character) data type to columns containing only numericinformation. Character data columns can hold anything and are a good catch-all.However, specifying CHAR for numeric columns can decrease performance withfilters and join conditions and can increase storage requirements. For example,suppose you're developing an application that tracks equipment stored at severalsites. Tables 1 and 2 show the application's inventory_header and inventory_detail.As you design your application's tables, you notice that the part_numbercolumn always contains a number between 1 and 12,000. However, when youcreate the tables, you define all these columns with a char(20) type and placean index on these columns in each table.
A commonly used report in your application prints a listing of theinventory-item counts at all locations. The following statements make thisreport printout possible:
select b.location, a.part_number, a.part_desc, b.part_count from inventory_header a, inventory_detail b where a.part_number = b.part_numberorder by b.location, a.part_number
If you define the part_number columns in Tables 1 and 2 aschar(20), the join operation requires the engine to compare the values in eachtable byte by byte, up to 20 times per row. Because you know the part_numbercolumn is always numeric and between 1 and 12,000, you can redefine the part_numbercolumn's data type in both tables as smallint. This data type can holdvalues between -32767 and +32767.
This data type helps your application perform better in several ways.First, because SQL Server stores smallint values in only 2 bytes of data, yousave a significant amount of CPU work during join operations or filterconditions, especially if you are processing several rows. Instead of comparingup to 20 bytes to see whether two rows are logical join candidates, the enginecan finish its work after comparing only two bytes. Thus, using smallint insteadof char(20) saves you 18 bytes per row per table for all indexes that use thiscolumn.
Tip 2: Prevent Substring Searches
When search criteria in a SQL statement start at the leftmost position of acolumn and move right, searching through the indexed character data isinherently easier than when your search criteria begin in the middle of thedata--that is, in a substring. For example, suppose you have to locate two typesof information in a phone book: all last names starting with a capital Land all entries with a lower-case l as the third letter in the lastname.
The first request is easy because the phone book is already indexed (lastname, first name). You simply turn to the L section of the book and readthrough the names until you find the first entry with a last name starting withM.
This type of processing is analogous to running the SQL statement
select * from names where last_name >= 'L%'' and last_name < 'M%''
If an index is on the last_name column in the names table, theoptimizer uses the index to help process this query.
For both machines and humans, the second request is harder to process thanthe first. With the phone book, you have to read the first three letters ofevery entry from cover to cover and mark all entries that have l as thethird letter.
This type of processing is analogous to running a SQL statement thatsearches for a substring within the last_name column.
select * from nameswhere substring(last_name, 3,1) = 'L''
Although an index is on last_name, the optimizer can't use itbecause you're not beginning your search from the leftmost byte of the last_namecolumn. To satisfy your request, the optimizer has to operate sequentiallyon what could be an enormous table, which definitely hurts performance.
TABLE 1: Inventory_header Table
part_number | part_desc |
---|---|
78 | Monochrome ASCII terminal |
TABLE 2: Inventory_detail Table
location | part_number | part_count |
---|---|---|
Phoenix | 78 | 0 |
Detroit | 78 | 25 |
New York | 78 | 148 |
TABLE 3: Bonus Program Participation
account_number | participated | |
---|---|---|
55095543 | Y | |
55095544 | Y | |
55095545 | N | |
55095546 | Y | |
55095547 | N | |
55095548 | Y |
If you must sometimes sort on a noninitial substring, consider revisingyour database design. Try to divide the column in question into several smallercolumns that you can index. This revision will help accelerate searching,joining, and sorting.
Tip 3: Don't Assign Too Much Memory
As part of installation and administration, theSQL Enterprise Manager lets you set the amount of memory dedicated to SQLServer. If you allocate too much memory to SQL Server, you increase the burdenon NT by denying it valuable memory resources necessary to service other tasks,such as other applications and the operating system. This burden can result inswapping, which occurs when the OS runs out of memory and starts resorting tothe disk to make up the shortfall. In fact, too high a memory value can makebooting SQL Server difficult, if not impossible. Swapping can quickly negate anyperformance gains you obtain by increasing the cache hit ratio. (The cache hitratio is a measure of the percentage of time that SQL Server finds informationin memory, rather than on disk. Disk I/O is expensive, so reducing it is a goodidea whenever possible. A high cache-hit ratio means that the engine can avoidthis costly effort.)
The most logical approach to allotting memory to SQL Server is to raise thevalue gradually and monitor performance carefully between changes. If cachingrates don't improve after an increase, you've reached the optimal memory value.Also keep track of the swap rate, because raising the memory value caninadvertently increase swapping.
Tip 4: Avoid Highly Duplicate Indexes
A highly duplicate index contains a limitednumber of unique values. For example, if you place an index on a column thatcontains only two or three values, that index is highly duplicate.
Having a highly duplicate index on a column is worse than having no index atall. The database engine must work a lot to process a highly duplicate indexduring important operations such as inserting, updating, or deleting data. Forexample, suppose you have a table that tracks customers' participation in aspecial bonus program, as you see in Table 3. If you place an index on the participatedcolumn, you have only two possible values for that index: Y or N. When youadd a new row into this table, the engine must add the new value to the existingindex.
This addition can cause the undesired side effect of internal indexrestructuring to handle the new value. Deleting, querying, or sorting based onthe index also makes the engine work harder, thus adding many unnecessary I/Ooperations to your system and likely degrading performance. Instead of using thehighly duplicate index, having the engine go directly to the data and read itsequentially is probably faster and simpler.
You can determine whether an index is highly duplicate by running theDatabase Consistency Checker (DBCC) show_statistics command or selecting theIndex Distribution Statistics option on the Index Information screen in SQLEnterprise Manager. With other information, the statistics utility reports onthe index's selectivity (as you see in Screen 1), the relative number ofdistinct values in the index. In this case, only two values are possible, whichyields a poor degree of selectivity.
When confronted with such a poor degree of selectivity, SQL Server may noteven use the highly duplicate index at all to retrieve information. However, thehighly duplicate index still causes extra work for SQL Server during inserts,updates, and deletes.
To avoid performance problems when you must sort or search on a highlyduplicate value, combine the column in question with a column containing moreselective values. This step creates a composite index. In the previous example,the correct composite index is
create index bonus_acct_ participated_ix on bonus(account_number, participated)
Because the account_number column is more selective than the participatedcolumn, combining these two columns yields a more selective and efficientindex. You can now use this index to search for and sort by account_numberfor all customers who have participated in the bonus plan. (Don't forgetthat to use a composite index in filtering or sorting data, your SQL statementmust always specify columns in a table from left to right.)
Tip 5: Use the update statistics Command
To process indexed database operationsas efficiently as possible, SQL Server keeps track of index key values,including the location of index key values and the values themselves. To do so,SQL Server runs the update statistics command when you create or re-create anindex.
If you expect your key values to change over time, periodically rerun thisimportant command. If your key values change and you don't refresh the keydistribution statistics by running update statistics, the optimizer can make awrong query-processing decision based on this old information. Although old keydistribution statistics won't affect the data the engine returns, they cansubstantially slow system performance.
Tip 6: Prevent Long Transactions
SQL Server uses the transaction log to trackalterations to data. In addition to protecting your data, the transaction loghas an important effect on database speed. One crucial transaction logperformance issue is the length of your transactions. Keep your transactions asbrief as possible because long, open transactions can wreak havoc on databaseapplication performance.
Suppose you've developed a client support application for a large insurancecompany. This application lets customer service reps give customers informationabout their policies over the phone. One day, a rep is working with a customerwho has reported a billing error. The rep pulls up the application's paymentrecord screen to modify the customer's record. After the phone call, the repgoes to lunch, leaving the customer's record on screen to complete it later. Ifthe client-support application resembles the following pseudocode example, youhave a potential performance problem.
get the customer's name and account numberbegin transactionbuild up a queryretrieve the recorddisplay the recordprompt the user for changesupdate the recordcheck for errorsif no errorscommit transactionelserollback transaction
If the rep takes a break after pulling the record but before changing it, along transaction can result. Because SQL Server can reclaim transaction logspace for only finished transactions, the transaction log can't be freed untilthis transaction is complete. And all data modification operations will probablycome to a grinding halt until the log can restore its empty space.
Modifying the above example to avoid the long transaction problem producesthe following pseudocode:
get the customer's name and account numberbuild up a queryretrieve the recorddisplay the recordprompt the user for changesbegin transactionupdate the recordcheck for errorsif no errorscommit transactionelserollback transaction
By beginning the transaction only after a user enters changes to the record,you eliminate the potential for a long transaction. You can add some logic tothe code to ensure that another user doesn't change the record's contentsbetween the time you select the record and when you update it.
Tip 7: Place the Transaction Log on a Separate Device
SQL Server lets you choosewhere to store the transaction log. For most read and write applications, thetransaction log is a heavily accessed disk structure because it is at the coreof SQL Server's data integrity architecture. Because I/O is often the bottleneckon today's fast systems, spreading the I/O load among as many devices aspossible is a good idea. To accelerate I/O performance, consider placing thetransaction log on a different physical device from the database. Another reasonto put the transaction log on a different device is so that you can periodicallyperform transaction log dumps without having to back up the data.
Tip 8: Use One Large Database
SQL Server developers and database administratorsfrequently make the mistake of dividing information into several smalldatabases, rather than storing the data in one large database. Why is this aproblem? One way SQL Server speeds processing is to use its memory buffers(i.e., cache) to reduce disk I/O. These memory buffers hold data, indexes, andstored procedures. When a SQL Server application must access one of theseresources, the engine first looks in the appropriate memory cache area. On abusy system, many such requests are satisfied from memory rather than from disk.Reducing the number of times the engine must make a costly disk inquiryaccelerates access.
If you divide your data into many small databases, chances are users can'tshare much of the information stored in memory, which leads to degradedperformance. For example, suppose you maintain a manufacturing system thatsupports your firm's worldwide operations and resides on one large server. Youdivide your data into separate databases, according to geography--for example,WestEurope, EastEurope, WestNorthAmerica--to let users access data for onlytheir region. As part of this strategy, you duplicate many lookup tables in thesmall databases.
The combination of separate, small databases and duplicated lookup tablesgreatly reduces the potential for memory caching. Although users can beexamining identical copies of lookup tables in each database, the databaseengine can't take advantage of memory caching across databases.
For example, what happens when Ted looks for part number 20993 in theWestNorthAmerica parts table and Alex must find the same part in the EastEuropedatabase? Ted pulls the appropriate page into memory, yet Alex won't see thatpage: Although Alex is reading a row on the same page, he retrieves it from adifferent database. If one large database holds this data, Alex's processdoesn't have to issue a disk read request because the page is already in memory.If you multiply the number of unnecessary read requests by the number of users,you can see how performance can suffer.
Lost caching opportunities also affect stored procedures. Besides data,users can share stored procedures (although the engine must generate query plansif the procedure is already in use).
Although you sometimes need multiple small databases, try to use one largedatabase. To limit data access to appropriate users, you can choose amongseveral methods, such as key design and views.
Tip 9: Index Temporary Tables
Temporary tables (work tables) are transitorystorage locations that SQL Server uses to satisfy processing demands. SQL Serverand you can create these tables.
Often the engine can build an implicit temporary table. For example, SQLServer can create temporary tables when it can't use available indexes toprocess information. When SQL Server automatically creates work tables, asidefrom reworking your queries or altering database index structure, you don't havemuch control over these work tables. For example, you can't create indexes onthem, but SQL Server often builds indexes on the key columns of implicitlycreated work tables.
You can specify the create table command to build explicit temporary tablesto hold data until the application finishes. When you explicitly create atemporary table, you largely control how that table is indexed, if at all. Ifyou expect the work table to contain large amounts of data, you can createindexes where appropriate. Doing so helps reduce the time needed to locateinformation in your temporary table, which should lead to better applicationresponse.
Tip 10: Control the Amount of Published Data
SQL Server provides powerfulreplication features as part of its core capabilities. When you use thesefeatures, following good techniques to replicate information is important. Forexample, just because you can replicate a specific table doesn't mean youshould. Sometimes, administrators replicate unnecessary tables. This replicationincreases the work for all machines in your environment, adds to your network'sburdens, and enlarges the amount of disk space necessary on subscriptionmachines.
Before you add a table to a publication, decide whether to replicate thetable. If you do, you then have to decide whether to replicate horizontally orvertically. Either operation will reduce the amount of information SQL Serverpasses among systems.
For example, to replicate an account table that has several million rows,you must decide whether this table is a good candidate for replication: Domultiple systems need to see the table at the same time? If the answer is yes,you have to decide whether all subscription servers will want all theinformation in the table. Chances are you can subdivide this data horizontally(across rows) or vertically (across columns). Taking these extra analysis stepscan significantly improve performance, because you can sharply reduce the amountof information flowing among your systems.
Future Reading
You can immediately start improving SQL Serverperformance by applying any of these techniques. For in-depth information aboutthe tips in this article and other performance-enhancing ideas, see theforthcoming Microsoft SQL Server: Designing and Building a High PerformanceDatabase (Prentice Hall, Upper Saddle River, NJ, November 1996, ISBN0-132-66222-1).
About the Author
You May Also Like