Why You Should Be Using DBCC CLONEDATABASE for Performance Tuning Queries
Performance tuning queries on production databases happens, but it absolutely, positively shouldn't. Here's another, more efficient and safe, path to database optimization.
Performance tuning queries on a large database come with considerable overhead. First of all, while you can tune against a production database, you shouldn’t. Let me reiterate that if I was a bit vague: You should never--never ever--performance tune against a production database.
Why? Well, the reasons are many, but I’ll focus on the only ones that truly matter. (Honestly, only the first one really matters.)
First of all, this is production: It’s the livelihood of your company or your client’s company if you’re an independent contractor. Even if you’re simply tuning SELECT actions--those that perform reads but don’t manipulate data--you’re still placing overhead on production systems. Secondly, this is production: You’re possibly taking locks on rows, pages, or tables and blocking production traffic. Thirdly, if you’re performance tuning queries, then you’re likely doing one of two things: You're altering query code or stored procedures to see if your changes are better than the existing and poorly performing code, or you’re creating copies of stored procedures and thereby tainting the structure of the database.
With all that said, I understand as well as any data professional that the apparent alternative is not much better: creating a copy of the database. Creating a copy of a small database to run query tuning against can be simple enough: a simple backup/restore process to a development server is entry-level DBA territory. However, now more than ever there are concerns about data privacy. A copy of a production database likely includes personal, financial or health-related information subject to countless regulations, and bringing with it strong penalties should you let that data fall into hands it’s not intended for. Large and very large databases come with the same concerns about data privacy but also add the overhead of ensuring you’re able to allocate significant storage to environments that really may not need it. It’s not just a capacity issue, but a financial issue--particularly in the age of cloud computing and paying for consumption.
What is a modern DBA to do?
What if you could create a SQL Server database that behaved like a large database? What if you could build a belly full of records and the associated statistics that at a metadata level explain the distribution of the data values in each table--which is vital for the query optimizer's ability to create execution plans for analysis key to performance tuning--but without taking up space?
Black magic, you say?
DBCC CLONEDATABASE, I say.
I remember many years ago sitting in a classroom at a local community college in Portland, Ore., for a SQLSaturday event where a friend and fellow Microsoft Data Platform MVP was presenting on how to script out the objects of a database and then, after using that script to create a “shell copy” of the database, import the statistics into the shell database from the source database. It was a genius play, allowing administrators to performance-tune queries without consuming space in the process or placing sensitive data into the hands of developers or other staff members who should not hold custody of sensitive data. The DBCC CLONEDATABASE command performs that same task but with a single command.
Using DBCC CLONEDATBASE is simple and straightforward. Here is the syntax:
DBCC CLONEDATABASE ('SOURCE_DATABASE', 'NEW_DATABASE')
It’s really as simple as that. Let’s look at it in action with a database from my testing environment. First I’m going to create an empty database with a single table. I’ll create a clustered index on the empty table’s identity column as well as a non-clustered index on a date column to be populated eventually, as well. Finally, so I can create a quick second table to join to in this example, I’ll execute a SELECT … INTO query.
CREATE DATABASE [SourceDB];
GO
ALTER DATABASE [SourceDB] SET RECOVERY SIMPLE
GO
USE [SourceDB]
GO
CREATE TABLE [dbo].[JunkDrawer](
[ID] [int] IDENTITY(1,1) NOT NULL,
[theDate] [datetime] NOT NULL,
[theValue] [int] NOT NULL,
[someGUID] [uniqueidentifier] NOT NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [MyAmazingClusteredIndex] ON [dbo].[JunkDrawer]
(
[ID] ASC
)
WITH
(
PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [NotSoAmazingNonClusteredIndex] ON [dbo].[JunkDrawer]
(
[theDate] ASC
)
WITH
(
PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 90
) ON [PRIMARY]
GO
I’ll then add a modest number of records--1,200, to be exact--and run the tblFoo command to create a new table with the same records but none of the indexes:
SELECT * INTO tblFOO FROM dbo.JunkDrawer;
Running the following query in the SourceDB provides the execution plan presented after. I’m pulling the graphical execution plan into SentryOne Plan Explorer in order for it to display the record count that’s pertinent to this article when we look at the cloned database later on.
SELECT S.ID, S.theDate, F.someGUID
FROM SourceDB.dbo.JunkDrawer AS S
INNER JOIN SourceDB.dbo.tblFOO AS F
ON S.ID = F.ID
ORDER BY S.theDate
We get an Index Scan operation on the non-clustered index and table scan against the heap in tblFoo since we created that table through a SELECT… Into statement. Note the record counts listed next to the operations.
Now it’s time to run DBCC CLONEDATABASE.
Executing the following command will give me a new database called TargetDB:
DBCC CLONEDATABASE(SourceDB, TargetDB);
The results should be exposed as well as their disclaimers:
Database cloning for 'SourceDB' has started with target as 'TargetDB'.
Database cloning for 'SourceDB' has finished. Cloned database is 'TargetDB'.
Database 'TargetDB' is a cloned database. A cloned database should be used for diagnostic purposes only and is not supported for use in a production environment.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
At this point we can query TargetDB.dbo.JunkDrawer and we should find it empty:
SELECT COUNT(ID)
FROM TargetDB.dbo.JunkDrawer;
However, if we run the same query above we should now it return a graphical execution plan as though it was fully populated like it is in the SourceDB:
SELECT S.ID, S.theDate, F.someGUID
FROM TargetDB.dbo.JunkDrawer AS S
INNER JOIN TargetDB.dbo.tblFOO AS F
ON S.ID = F.ID
ORDER BY S.theDate
The execution plan is identical for TargetDB, even though there are no records and the database is much smaller than the source database (when employed in real-world situations.) You’ll note the record counts in the TargetDB version. While the database knows there are no records, the execution plans are built from distribution statistics.
By utilizing DBCC CLONEDATABASE you’re able to troubleshoot various query tuning concerns without taking up space on your server or including possibly sensitive data in areas that may not be under as much control as your production databases. The cloned database is sized to the same specification as the model database, not the source database.
It's also important to note that, when using DBCC CLONEDATABASE, if you’ve added any user objects to the model database the clone command will fail. This is because there could be possible collisions of object_id values between user objects created in model and object_id values in your source user database. The process creates a hidden database snapshot to create the clone. If, for any reason, that snapshot can’t be created, the clone process will fail. You also can’t create a clone database if the name you specify already exists.
This is a much quicker route to run through the process I observed years ago. It takes seconds and doesn’t require understanding of how to migrate statistics objects between databases, which is an advanced topic for many DBAs.
About the Author
You May Also Like