With SQL server 2014 SP 2 Microsoft added a new feature that sounds pretty sweet for troubleshooting some performance problems. It is called Clone Database. In a Nutshell Clone database gives you a copy of the specified databases without any data in it, but with the statistics and heuristics of the database as if it did have data in it.

A script containing the code referenced in this post can be found here (dbcc-clone-db).

 

Some Quick Research


To start we will use AdventureWorks2014 database and find the largest table.

sqr_1_1 sqr_1_2

SalesOrderDetail has the most rows. It has enough rows to illustrate what I want to show in this article and we do not need to load more rows into it.


Let’s do a quick review the of indexes on the table to see what we have to work with

sqr_2_1

sqr_2_2

We can see we have indexes on rowguid, another on ProductId, and one using SalesOrderId and SalesOrderDetailId.


What statistics do we have on this table right now? There are a few ways to review this information.

sqr_3_1

sqr_3_2

We can see there are statistics for each of our indexes and there are also some autocreated statistics on SpecialOfferId.


For our tests we will use ProductId. Let’s review its distribution.

sqr_4_1

sqr_4_2

As Erin Stellato stated in her blog on statistics (https://www.simple-talk.com/sql/performance/managing-sql-server-statistics/)

“The Updated column in the statistics header displays the last time statistic was updated, whether it was via an automatic or manual update. We can use the Rows and Rows_Sampled columns to determine whether the sample rate was 100% (FULLSCAN) or something less.

The density vector, which is the second set of information, provides detail regarding the uniqueness of the columns. The optimizer can use the density value for the left-based column in the key to estimate how many rows would return for any given value in the table. Density is calculated as “1/unique number of values in the column”.

The final section of the output, the histogram, provides detailed information, also used by the optimizer. A histogram can have a maximum of 200 steps, and the histogram displays the number of rows equal to each step, as well as number of rows between s

The teps, number of distinct values in the step, and the average number of rows per value in the step.”

 Looking at our density vector we have 0.003759399 or 265.999964 distinct values which we can check by running

SELECT DISTINCT ProductId FROM Sales.SalesOrderDetail –266 distinct ProductIds

Looking at our histogram we can see the RANGE_HI_KEY matches our ProductIds and EQ_ROWS matches the number of records corresponding to that ID

SELECT ProductId, COUNT(*) as RecordCount FROM Sales.SalesOrderDetail GROUP BY ProductId ORDER BY 1 ASC


 

The Setup


Now that we have researched some basic information. Let’s run some queries.

First we will run a query using ProductId in the where clause. Since ProductId is also the only column in our select list or joins from the SalesOrderDetail table we expect this will give us an execution plan using a seek operation.

ts_1_1

ts_1_2

We can see the optimizer did indeed perform a seek using the IX_SalesOrderDetail_ProductId index.


We will now add the OrderQty field to our requested output. Since this field is not in any indexes the optimizer will have to get it by either scanning an index or by doing a bookmark lookup against our clustered index. Since there are only 121317 records in the SalesOrderDetail table. We expect it will most likely choose to perform a clustered index scan. Let’s find out.

ts_2_1

ts_2_2

We did indeed get a clustered index scan.


From our research above when we ran SELECT ProductId, COUNT(*) as RecordCount FROM Sales.SalesOrderDetail GROUP BY ProductId ORDER BY 1 ASC we found that ProductId 744 only had 13 records associated to it. With this small amount of records we should be able to see if we can get a seek and a lookup that is less costly than an index scan.

ts_3_1

ts_3_2

We did. Excellent.


 

In Comes Clone Database


That was a lot of setup. Now let’s bring in DBCC CloneDatabase and see what we can do what we can do with a database without any data in it.

Clone our AdventureWorks2014 database using the following command.

ccd_1_1


We can see it created our clone in readonly mode.

ccd_2_1


We can also see that it is significantly smaller.

ccd_3_1 ccd_3_2


Rerunning our query Some Quick Research # 1 over the clone copy we see we still get the same row counts for the tables since the query is based off of metadata.

ccd_4_1

Running a direct query from the table shows we do not actually have data however.

SELECT ProductId, COUNT(*) as RecordCount FROM Sales.SalesOrderDetail GROUP BY ProductId ORDER BY 1 ASC

ccd_4_2


Now let’s check our execution plans over the clone database. We expect the same plans.

ccd_5_1

ccd_5_2

The same Index Seek…

ccd_5_3

ccd_5_4

The same Clustered Index Scan…

ccd_5_5

The same index seek with bookmark lookup…

We can see that we got the same execution plans. Let’s try to create a covering index for query 2 to see if we can change it from a clustered index scan to an index seek. Remember there is no data in this database.


I want to create a covering index to test the results. Since we cannot create or alter an index on a database in read_only mode. Let’s see if it allows us to change that.

ccd_6_1

ccd_6_2


Now create a new covering index that includes both ProductId and OrderQty. Note: normally we would probably just add the OrderQty to the already existing ProductId index.

ccd_7_1


Now that the index is created lets run our query one last time to see if we get a different execution plan.

ccd_8_1

ccd_8_2

We can see the optimizer did use our new covering index to perform an index seek instead of just determining the clustered index scan was the most efficient way to go.


I can see clone database being a great tool in the toolbox.

Can you see fitting this into your troubleshooting routine?

Let us know in the comments below.