Performance Effects of Using GUIDs as Primary Keys
Are globally unique identifiers (GUIDs) good candidates for non-clustered primary keys? Find out!
June 23, 2005
I get asked time and again whether globally unique identifiers (GUIDs) are good candidates for non-clustered primary keys. Is there a performance difference between using GUIDs and integer-identity columns as non-clustered primary keys? I have readers who've had no problems using GUIDs as primary keys in tables that have millions of rows. I know that I ask for trouble by attempting a short answer to a question that has many facets, but I'll jump in with both feet and say that I avoid using GUIDs as primary keys and tell my customers to avoid it as well.
The most compelling reason to use a GUID as a primary key is that it's extremely simple from a programming and design perspective. Simple is often the best answer, but I don't think so in this case. We'll ignore how frustrating it can be to work with a GUID when you're debugging and writing code. And I won't discuss the relative value of a surrogate key that has some meaning and can be exposed to a user, because I don't want to broach two heavily controversial topics in a single week, so let's pretend I didn't even bring it up. I'm a performance-tuning geek at heart, so I'll focus on that aspect.
Compared to integers, GUIDs are big, fat data types that have several performance characteristics that make them sub-optimal for indexes. To avoid delving too deep here, I'd like to point you to Jimmy Nillson's article "The Cost of GUIDs as Primary Keys." This article points out some significant performance penalties that you might experience when creating indexes on a GUID. The article also discusses the pros and cons and offers helpful suggestions for creating an alternative surrogate-key strategy, based in part on GUIDs, that doesn't suffer the same performance penalties you get when you use a native GUID.
I'll also point you to Gert Drapers's published resources about this topic. Drapers is a long-time member of Microsoft's SQL Server team and is one of the most knowledgeable SQL Server professionals in the world. Drapers publishes some material on his own Web site rather than as an official Microsoft resource. I usually avoid recommending code like this, but I make an exception for Drapers because his material is always top notch. Drapers's set of procedures and functions, XPGUID, provides GUID-related procedures to "improve the usage of GUIDs inside SQL Server 2000." In addition, the code on the site provides a procedure for generating sequential GUIDs and a procedure for testing whether a string represents a valid GUID without generating an error that will abort your T-SQL batch. I hope you find the material useful. Of course, as with any untried resource, you should never use third-party resources on your production systems.
SQL Server 2005 will make this whole question easier to answer. I've never been a big fan of the tools that Microsoft provides for automatically creating surrogate keys for the following reason: I want something with decent performance--the IDENTITY() function for example--but with the ease of use that a GUID provides. SQL Server 2005 addresses many of the current performance problems associated with using a GUID-based index by adding the newsequentialid() function. Newsequentialid() addresses some of the performance problems inherent in the current implementation of newid() by generating a GUID that will sort in an increasing order, making it more suitable for indexing. I haven't had a chance to do performance testing with this function, but I hope that it will help us resolve the question about whether to use GUIDs as primary keys.
About the Author
You May Also Like