First Look at System-Versioned Temporal Tables-Part 2: Querying Data and Optimization Considerations
Here we extend the exploration of SQL Server 2016's new temporal tables feature, including code and sample data to get you started.
July 6, 2015
This article is the second part in a two-part series about system-versioned temporal tables—a new feature introduced in Microsoft SQL Server 2016. Part 1 covered what system-versioned temporal tables are, how to create them and how to modify data in them. This article focuses on querying data and optimization considerations.
I’d like to thank several people who participated in discussions on the topic for their insights. From Microsoft: Borko Novakovic (the feature’s PM), Carl Rabeler and Conor Cunningham. Fellow SQL Server MVPs: Hugo Kornelis, Simon Sabin, Matija Lah, Phil Brammer, Louis Davidson, Jeffrey Moden and Erland Sommarskog.
Note: At the date of this writing the latest available public build of SQL Server 2016 is CTP2. Make sure you check the official product documentation for information about any changes and additions in later builds.
Recreating sample data from Part 1
If you want to run the queries from the article and get the same results as in my examples, you need your tables to have the same sample data as in mine. Use the code in Listing 1 to create the current and history tables (Employees and EmployeesHistory), and populate them with the sample data.
Listing 1: Create sample data
-- Create TemporalDB database and drop tables if existSET NOCOUNT ON;IF DB_ID(N'TemporalDB') IS NULL CREATE DATABASE TemporalDB;GOUSE TemporalDB;GOIF OBJECT_ID(N'dbo.Employees', N'U') IS NOT NULLBEGIN IF OBJECTPROPERTY(OBJECT_ID(N'dbo.Employees', N'U'), N'TableTemporalType') = 2 ALTER TABLE dbo.Employees SET ( SYSTEM_VERSIONING = OFF ); IF OBJECT_ID(N'dbo.EmployeesHistory', N'U') IS NOT NULL DROP TABLE dbo.EmployeesHistory; DROP TABLE dbo.Employees;END;GO-- Create and populate Employees tableCREATE TABLE dbo.Employees( empid INT NOT NULL CONSTRAINT PK_Employees PRIMARY KEY NONCLUSTERED, mgrid INT NULL CONSTRAINT FK_Employees_mgr_emp REFERENCES dbo.Employees, empname VARCHAR(25) NOT NULL, sysstart DATETIME2(0) NOT NULL, sysend DATETIME2(0) NOT NULL);CREATE UNIQUE CLUSTERED INDEX ix_Employees ON dbo.Employees(empid, sysstart, sysend);INSERT INTO dbo.Employees(empid, mgrid, empname, sysstart, sysend) VALUES (1 , NULL, 'David' , '2015-06-01 19:54:04', '9999-12-31 23:59:59'), (2 , 1 , 'Eitan' , '2015-06-01 19:54:04', '9999-12-31 23:59:59'), (3 , 1 , 'Ina' , '2015-06-01 20:01:41', '9999-12-31 23:59:59'), (4 , 2 , 'Seraph' , '2015-06-01 19:54:20', '9999-12-31 23:59:59'), (5 , 2 , 'Jiru' , '2015-06-01 19:54:20', '9999-12-31 23:59:59'), (6 , 3 , 'Steve' , '2015-06-01 21:32:20', '9999-12-31 23:59:59'), (7 , 4 , 'Aaron' , '2015-06-01 21:32:20', '9999-12-31 23:59:59'), (8 , 5 , 'Lilach' , '2015-06-01 20:01:41', '9999-12-31 23:59:59'), (9 , 4 , 'Rita' , '2015-06-01 21:32:20', '9999-12-31 23:59:59'), (10, 5 , 'Sean' , '2015-06-01 20:01:41', '9999-12-31 23:59:59'), (11, 6 , 'Gabriel', '2015-06-01 21:32:20', '9999-12-31 23:59:59');-- Create and populate EmployeesHistory tableCREATE TABLE dbo.EmployeesHistory( empid INT NOT NULL, mgrid INT NULL, empname VARCHAR(25) NOT NULL, sysstart DATETIME2(0) NOT NULL, sysend DATETIME2(0) NOT NULL);CREATE CLUSTERED INDEX ix_EmployeesHistory ON dbo.EmployeesHistory(empid, sysstart, sysend) WITH (DATA_COMPRESSION = PAGE);INSERT INTO dbo.EmployeesHistory(empid, mgrid, empname, sysstart, sysend) VALUES(6 , 2, 'Steve' , '2015-06-01 19:54:20', '2015-06-01 21:32:20'),(7 , 3, 'Aaron' , '2015-06-01 20:01:41', '2015-06-01 21:32:20'),(9 , 7, 'Rita' , '2015-06-01 20:01:41', '2015-06-01 20:11:01'),(9 , 3, 'Rita' , '2015-06-01 20:11:01', '2015-06-01 21:32:20'),(11, 7, 'Gabriel', '2015-06-01 20:01:41', '2015-06-01 20:11:01'),(11, 3, 'Gabriel', '2015-06-01 20:11:01', '2015-06-01 21:32:20'),(12, 9, 'Emilia' , '2015-06-01 20:01:41', '2015-06-01 21:32:20'),(13, 9, 'Michael', '2015-06-01 20:01:41', '2015-06-01 20:11:01'),(14, 9, 'Didi' , '2015-06-01 20:01:41', '2015-06-01 20:11:01');-- Enable system versioningALTER TABLE dbo.Employees ADD PERIOD FOR SYSTEM_TIME (sysstart, sysend);ALTER TABLE dbo.Employees SET ( SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.EmployeesHistory ) );
Run the following code to show the contents of the current table:
SELECT *FROM dbo.Employees;
Table 1 shows the output of this query.
Table 1: Contents of Employees table
empid mgrid empname sysstart sysend---------- ---------- -------- ---------------------- ----------------------1 NULL David 2015-06-01 19:54:04 9999-12-31 23:59:592 1 Eitan 2015-06-01 19:54:04 9999-12-31 23:59:593 1 Ina 2015-06-01 20:01:41 9999-12-31 23:59:594 2 Seraph 2015-06-01 19:54:20 9999-12-31 23:59:595 2 Jiru 2015-06-01 19:54:20 9999-12-31 23:59:596 3 Steve 2015-06-01 21:32:20 9999-12-31 23:59:597 4 Aaron 2015-06-01 21:32:20 9999-12-31 23:59:598 5 Lilach 2015-06-01 20:01:41 9999-12-31 23:59:599 4 Rita 2015-06-01 21:32:20 9999-12-31 23:59:5910 5 Sean 2015-06-01 20:01:41 9999-12-31 23:59:5911 6 Gabriel 2015-06-01 21:32:20 9999-12-31 23:59:59
Run the following code to show the contents of the history table:
SELECT *FROM dbo.EmployeesHistory;
Table 2 shows the output of this query.
Table 2: Contents of EmployeesHistory table
empid mgrid empname sysstart sysend---------- ---------- -------- ---------------------- ----------------------6 2 Steve 2015-06-01 19:54:20 2015-06-01 21:32:207 3 Aaron 2015-06-01 20:01:41 2015-06-01 21:32:209 7 Rita 2015-06-01 20:01:41 2015-06-01 20:11:019 3 Rita 2015-06-01 20:11:01 2015-06-01 21:32:2011 7 Gabriel 2015-06-01 20:01:41 2015-06-01 20:11:0111 3 Gabriel 2015-06-01 20:11:01 2015-06-01 21:32:2012 9 Emilia 2015-06-01 20:01:41 2015-06-01 21:32:2013 9 Michael 2015-06-01 20:01:41 2015-06-01 20:11:0114 9 Didi 2015-06-01 20:01:41 2015-06-01 20:11:01
Now that your tables have the same sample data as mine, we can get started with querying them. Just keep in mind that since I executed the modification examples in Part 1 of the series over a short period of time (a couple of hours), the sample data accordingly spans such a short period. In reality, such changes would normally take place over multiple years.
Querying data and optimization considerations
The data of a temporal table is stored in two tables (current and history), but this fact can be transparent to the application. The application should interact with only one table. If you need to query only the current data, you just query the current table, without using any special clauses. For example, suppose you want to query the current state of all employees. You issue the following query:
SELECT *FROM dbo.Employees;
The plan for this query is shown in Figure 1.
Figure 1: Plan for query without the FOR SYSTEM_TIME clause
The plan scans the clustered index of the current table, and, of course, doesn’t need to involve the history table.
Suppose, though, that you need to query the state of the employees as it was at a past point in time provided in a parameter called @datetime, or even a period defined by the inputs @start and @end. You could technically issue two queries: one against the current table and another against the history table, with the appropriate filter predicates to capture the right versions, and unify the results with a UNION ALL operator. But, instead, you get an elegant clause called FOR SYSTEM_TIME that you specify in a query against a temporal table, or a view based on temporal tables, and in it indicate the validity time or period that you’re interested in. SQL Server then translates your short and elegant query to more complex queries with the right predicates against the current and history tables behind the scenes.
Before I demonstrate using the FOR SYSTEM_TIME clause, I want to make sure you have an accurate understanding of the interval that the system period start and end columns represent. The interval they represent is what’s called in mathematics a closed-open interval and is represented as [sysstart, sysend). A square bracket means a closed edge (inclusive), and an angle bracket represents an open edge (exclusive). So in our case, sysstart is included and sysend isn’t included.
You specify the FOR SYSTEM_TIME clause right after the table or view name, before any table alias, as in:
SELECT ... FROM FOR SYSTEM_TIME AS ;
The subclause you will use most often is the AS OF subclause. You pass a date and time literal, variable or parameter as input, such as FOR SYSTEM_TIME AS OF @datetime. You get back the rows that were considered valid at the input point in time, keeping in mind the fact that the period columns represent a closed-open interval. With our Employees temporal table, qualifying rows are ones satisfying the predicates: sysstart <= @datetime AND sysend > @datetime. For example, the following query returns the state of the employee rows that were valid as of 2015-06-01 20:11:01:
DECLARE @datetime AS DATETIME2(0) = '2015-06-01 20:11:01';SELECT *FROM dbo.Employees FOR SYSTEM_TIME AS OF @datetime;
Feel free to look at Table 1 and Table 2, which were provided earlier, and try to figure out which rows are supposed to be returned before you run the query.
Like I said, you can achieve the same with two queries and a UNION ALL operator, like so:
SELECT *FROM dbo.EmployeesWHERE sysstart <= @datetime AND sysend > @datetime UNION ALLSELECT *FROM dbo.EmployeesHistoryWHERE sysstart <= @datetime AND sysend > @datetime;
You get the same meaning, and in fact the same query plan, as shown in Figure 2.
Figure 2: Plan for query with FOR SYSTEM_TIME AS OF @datetime
The code (either solution) generates the following output (see if you figured out correctly which rows should be returned):
empid mgrid empname sysstart sysend---------- ---------- ---------- ---------------------- ----------------------1 NULL David 2015-06-01 19:54:04 9999-12-31 23:59:5910 5 Sean 2015-06-01 20:01:41 9999-12-31 23:59:592 1 Eitan 2015-06-01 19:54:04 9999-12-31 23:59:593 1 Ina 2015-06-01 20:01:41 9999-12-31 23:59:594 2 Seraph 2015-06-01 19:54:20 9999-12-31 23:59:595 2 Jiru 2015-06-01 19:54:20 9999-12-31 23:59:598 5 Lilach 2015-06-01 20:01:41 9999-12-31 23:59:5911 3 Gabriel 2015-06-01 20:11:01 2015-06-01 21:32:2012 9 Emilia 2015-06-01 20:01:41 2015-06-01 21:32:206 2 Steve 2015-06-01 19:54:20 2015-06-01 21:32:207 3 Aaron 2015-06-01 20:01:41 2015-06-01 21:32:209 3 Rita 2015-06-01 20:11:01 2015-06-01 21:32:20
Clearly, using the FOR SYSTEM_TIME clause allows a much simpler and more natural solution compared to not using it. This can go a long way toward improving the readability and maintainability of your code.
Note: The following discussion is relevant only if you’re relying on rowstore indexing.
If you’re wondering why the plan performs scans of the clustered indexes instead of seeks, that’s because if you recall from Part 1, the current clustered indexes on the tables have the empid column as the leading key, followed by systart and sysend, and our query doesn’t filter by the empid column. Here’s an example where you filter a specific employee:
DECLARE @datetime AS DATETIME2(0) = '2015-06-01 20:11:01', @empid AS INT = 9;SELECT *FROM dbo.Employees FOR SYSTEM_TIME AS OF @datetimeWHERE empid = @empid;
The plan for this query is shown in Figure 3.
Figure 3: Plan with seeks
This time the plan does show seeks in the indexes. If you expect to frequently issue queries without filtering a specific primary key value, you may want to consider also creating an index on the system period columns without the leading primary key column.
Another important thing to note regarding indexing is that when using multiple range predicates, only one of them can be used as a seek predicate (see Seek Predicates property in the plan); the rest are used as residual predicates (see Predicate property in the plan). This means that for optimal performance (to scan fewer pages in the index leaf), it’s advisable that you form the index key list like so: (, , ). With the current table it’s fine to place the sysstart column before the sysend column in the key list since sysstart will appear in the more selective range predicate (all rows will satisfy the predicate sysend > @datetime, whereas a subset of the rows will satisfy the predicate sysstart <= @datetime). But with the history table, if you tend to query recent data, the range predicate based on the sysend column will tend to be more selective than the one based on the sysstart column. Therefore, especially if you’re expecting many versions for each row, you want to make sure that when you do performance testing, you try an index with sysend before sysstart in the key list.
In case you need more information about such indexing considerations, I explain those in detail in the articles: Optimization Tips for Multiple Range Predicates, Part 1 and Optimization Tips for Multiple Range Predicates, Part 2.
FOR SYSTEM_TIME subclauses
The AS OF subclause is only one of four subclauses that the FOR SYSTEM_TIME clause supports. Here’s the full list of supported subclauses:
AS OF @datetime
FROM @start TO @end
BETWEEN @start AND @end
CONTAINED IN(@start, @end)
The inputs can be literals, variables or parameters. The first three subclauses are standard, whereas the fourth is a Microsoft extension to the standard.
Table 3 provides the predicates that represent the qualifying rows for the different subclauses.
Table 3: Qualifying rows for FOR SYSTEM_TIME subclauses
Figure 4 provides illustrations that depict the qualifying rows for the different subclauses graphically.
Figure 4: Illustrations of FOR SYSTEM_TIME subclauses
The green blocks represent the input date and time values in the FOR SYSTEM_TIME subclauses based on the type’s precision. Our system period columns and input variables use the type DATETIME2(0) and hence have a precision of one second. In the AS OF case, the green block represents the input @datetime value. In the rest of the subclauses, the leftmost green block represents the input @start value and the rightmost green block represents the input @end value.
Naturally, there may be additional supported date and time values between @start and @end, and those are represented by the green block in the middle. Using a block to represent a supported value in the type makes it easy for me to convey when a value is or isn’t included. The red and blue arrows represent which ranges of values in the systart and sysend columns, respectively, would make a row a qualifying one. Also keep in mind that there’s an implied condition, which is enforced by SQL Server, that says that sysend >= systart.
Let’s go over examples for all four cases. Starting with the AS OF @datetime subclause, the following query returns the row for the employee with the employee ID stored in @empid that was valid as of the date and time value stored in @datetime (qualifying row: sysstart <= @datetime AND sysend > @datetime):
DECLARE @datetime AS DATETIME2(0) = '2015-06-01 20:11:01', @empid AS INT = 9;-- also try with @datetime = '2015-06-01 20:11:00'SELECT *FROM dbo.Employees FOR SYSTEM_TIME AS OF @datetimeWHERE empid = @empid;
DECLARE @datetime AS DATETIME2(0) = '2015-06-01 20:11:01', @empid AS INT = 9;
-- also try with @datetime = '2015-06-01 20:11:00'
SELECT *
FROM dbo.Employees FOR SYSTEM_TIME AS OF @datetime
WHERE empid = @empid;
SELECT *FROM dbo.EmployeesWHERE empid = @empid AND sysstart <= @datetime AND sysend > @datetimeUNION ALLSELECT *FROM dbo.EmployeesHistoryWHERE empid = @empid AND sysstart <= @datetime AND sysend > @datetime;
Here’s the output that you got for @datetime = '2015-06-01 20:11:01':
empid mgrid empname sysstart sysend---------- ---------- -------- ---------------------- ----------------------9 3 Rita 2015-06-01 20:11:01 2015-06-01 21:32:20
And here’s the output that you get for @datetime = '2015-06-01 20:11:00':
empid mgrid empname sysstart sysend---------- ---------- -------- ---------------------- ----------------------9 7 Rita 2015-06-01 20:01:41 2015-06-01 20:11:01
The FOR SYSTEM_TIME FROM @start TO @end subclause gives you all row versions where sysstart is before the input @end, and sysend is after the input @start (qualifying rows: sysstart < @end AND sysend > @start). Consider the following example:
DECLARE @start AS DATETIME2(0) = '2015-06-01 19:00:00', @end AS DATETIME2(0) = '2015-06-01 21:32:20', @empid AS INT = 9;SELECT *FROM dbo.Employees FOR SYSTEM_TIME FROM @start TO @endWHERE empid = @empid;
This query is equivalent to the following:
SELECT *FROM dbo.EmployeesWHERE empid = @empid AND sysstart < @end AND sysend > @startUNION ALLSELECT *FROM dbo.EmployeesHistoryWHERE empid = @empid AND sysstart < @end AND sysend > @start;
You get the following output showing two row versions for the same employee during the period of interest:
empid mgrid empname sysstart sysend---------- ---------- -------- ---------------------- ----------------------9 7 Rita 2015-06-01 20:01:41 2015-06-01 20:11:019 3 Rita 2015-06-01 20:11:01 2015-06-01 21:32:20
The FOR SYSTEM_TIME BETWEEN @start AND @end subclause is very similar to the previous, only the predicate against systart is less than or equal to the input @end (as opposed to being just less than). The qualifying rows are the ones satisfying the conjunction of predicates: sysstart <= @end AND sysend > @start. Here’s an example using this subclause:
DECLARE @start AS DATETIME2(0) = '2015-06-01 19:00:00', @end AS DATETIME2(0) = '2015-06-01 21:32:20', @empid AS INT = 9;SELECT *FROM dbo.Employees FOR SYSTEM_TIME BETWEEN @start AND @endWHERE empid = @empid;
This query is equivalent to the following code:
SELECT *FROM dbo.EmployeesWHERE empid = @empid AND sysstart <= @end AND sysend > @startUNION ALLSELECT *FROM dbo.EmployeesHistoryWHERE empid = @empid AND sysstart <= @end AND sysend > @start;
You get the following output, which this time includes three row versions for the employee, compared to two row versions that you got with the previous subclause using the same inputs:
empid mgrid empname sysstart sysend---------- ---------- -------- ---------------------- ----------------------9 4 Rita 2015-06-01 21:32:20 9999-12-31 23:59:599 7 Rita 2015-06-01 20:01:41 2015-06-01 20:11:019 3 Rita 2015-06-01 20:11:01 2015-06-01 21:32:20
Finally, the fourth subclause is FOR SYSTEM_TIME CONTAINED IN(@start, @end). Unlike the others, this one is a Microsoft extension to the standard. It considers as qualifying rows ones where sysstart is on or after the input @start and sysend is on or before the input @end (qualifying rows: sysstart >= @start AND sysend <= @end). In other words, the system period has to be contained in the input period. Here’s an example using this subclause:
DECLARE @start AS DATETIME2(0) = '2015-06-01 20:01:41', @end AS DATETIME2(0) = '2015-06-01 21:32:20', @empid AS INT = 9;SELECT *FROM dbo.Employees FOR SYSTEM_TIME CONTAINED IN (@start, @end)WHERE empid = @empid;
This query is equivalent to the following code:
SELECT *FROM dbo.EmployeesWHERE empid = @empid AND sysstart >= @start AND sysend <= @endUNION ALLSELECT *FROM dbo.EmployeesHistoryWHERE empid = @empid AND sysstart >= @start AND sysend <= @end;
The query generates the following output, showing two qualifying row versions for the input employee:
empid mgrid empname sysstart sysend---------- ---------- -------- ---------------------- ----------------------9 7 Rita 2015-06-01 20:01:41 2015-06-01 20:11:019 3 Rita 2015-06-01 20:11:01 2015-06-01 21:32:20
Querying table expressions
As mentioned, you’re allowed to use the FOR SYSTEM_TIME clause against a system-versioned temporal table and against a view. (SQL Server will propagate the clause to inner system-versioned temporal tables.) As of SQL Server 2016 CTP2 (may change later), SQL Server doesn’t support the clause against other types of table expressions (derived tables, CTEs and inline table-valued functions). With those, you will need to specify the clause in the inner references to the temporal tables.
For example, suppose that you create an inline table-valued function (TVF) called ShowHierarchy, which uses a recursive CTE to return the hierarchy of employees with ancestor paths and level indication. You want to be able to query the function requesting to capture the hierarchy at a given point in time, like so:
DECLARE @datetime AS DATETIME2(0) = '2015-06-01 20:11:01';SELECT *FROM dbo.ShowHierarchy() FOR SYSTEM_TIME AS OF @datetime AS F;
But currently SQL Server doesn’t support specifying the FOR SYSTEM_TIME clause against the inline TVF and propagating it to the underlying tables like it does with views. For now, as a workaround, you can pass the date and time value as an input to the function, and specify it as input to the FOR SYSTEM_TIME AS OF clause directly in the inner queries against the Employees table. Here’s an example for such a solution defining a function called ShowHierarchyAt:
IF OBJECT_ID(N'dbo.ShowHierarchyAt', 'IF') IS NOT NULL DROP FUNCTION dbo.ShowHierarchyAt;GOCREATE FUNCTION dbo.ShowHierarchyAt(@datetime AS DATETIME2(0)) RETURNS TABLEASRETURN WITH EmpsCTE AS ( SELECT empid, mgrid, empname, 0 AS lvl, CAST('.' + CAST(empid AS VARCHAR(10)) + '.' AS VARCHAR(900)) AS path FROM dbo.Employees FOR SYSTEM_TIME AS OF @datetime WHERE mgrid IS NULLUNION ALLSELECT S.empid, S.mgrid, S.empname, M.lvl + 1 AS lvl, CAST(M.path + CAST(S.empid AS VARCHAR(10)) + '.' AS VARCHAR(900)) AS path FROM EmpsCTE AS M INNER JOIN dbo.Employees FOR SYSTEM_TIME AS OF @datetime AS S ON S.mgrid = M.empid ) SELECT empid, mgrid, empname, lvl, path FROM EmpsCTE;GO
Use the following query against the function, passing the date and time value 2015-06-01 20:01:41 as input:
SELECT REPLICATE(' | ', lvl) + '(' + CAST(empid AS VARCHAR(10)) + ') ' + empname AS empFROM dbo.ShowHierarchyAt('2015-06-01 20:01:41') AS FORDER BY path;
You will get the following state of the hierarchy as it was correct at the requested point in time:
(1) David | (2) Eitan | | (4) Seraph | | (5) Jiru | | | (10) Sean | | | (8) Lilach | | (6) Steve | (3) Ina | | (7) Aaron | | | (11) Gabriel | | | (9) Rita | | | | (12) Emilia | | | | (13) Michael | | | | (14) Didi
Show the hierarchy at 2015-06-01 20:11:01:
SELECT REPLICATE(' | ', lvl) + '(' + CAST(empid AS VARCHAR(10)) + ') ' + empname AS empFROM dbo.ShowHierarchyAt('2015-06-01 20:11:01') AS FORDER BY path;
You will get the following output:
(1) David | (2) Eitan | | (4) Seraph | | (5) Jiru | | | (10) Sean | | | (8) Lilach | | (6) Steve | (3) Ina | | (11) Gabriel | | (7) Aaron | | (9) Rita | | | (12) Emilia
Show the hierarchy at 2015-06-01 20:11:01:
SELECT REPLICATE(' | ', lvl) + '(' + CAST(empid AS VARCHAR(10)) + ') ' + empname AS empFROM dbo.ShowHierarchyAt('2015-06-01 21:32:20') AS FORDER BY path;
You will get the following output:
(1) David | (2) Eitan | | (4) Seraph | | | (7) Aaron | | | (9) Rita | | (5) Jiru | | | (10) Sean | | | (8) Lilach | (3) Ina | | (6) Steve | | | (11) Gabriel
Another capability that is not yet available is to specify a direct correlation in the FOR SYSTEM_TIME clause as an input. This could be handy if you want to return multiple states of the data using the APPLY operator (or a correlated subquery) based on input date and time values stored in a table or a table valued parameter (TVP). Here’s an example demonstrating this need attempting to return the managers of an employee at different points in time. (Don’t actually run this since this code is not supported, yet.):
DECLARE @PointsInTime AS TABLE( p VARCHAR(10) NOT NULL PRIMARY KEY, dt DATETIME2(0) NOT NULL UNIQUE);INSERT INTO @PointsInTime(p, dt) VALUES('T1', '2015-06-01 19:54:04'), ('T2', '2015-06-01 19:54:20'), ('T3', '2015-06-01 20:01:41'), ('T4', '2015-06-01 20:11:01'), ('T5', '2015-06-01 21:32:20');SELECT P.p, P.dt, E.mgridFROM @PointsInTime AS P OUTER APPLY ( SELECT * FROM dbo.Employees FOR SYSTEM_TIME AS OF P.dt AS E WHERE E.empid = 9 ) AS E;
For now there is a fairly simple workaround. Create an inline TVF that accepts the date and time value as an input parameter, queries the temporal table, and specifies the input parameter in the FOR SYSTEM_TIME clause, like so:
IF OBJECT_ID(N'dbo.EmployeeAt', 'IF') IS NOT NULL DROP FUNCTION dbo.EmployeeAt;GOCREATE FUNCTION dbo.EmployeeAt(@empid AS INT, @datetime AS DATETIME2(0)) RETURNS TABLEASRETURN SELECT empid, mgrid, empname, sysstart, sysend FROM dbo.Employees FOR SYSTEM_TIME AS OF @datetime WHERE empid = @empid;GO
Then instead of using a query with a correlation as the applied table expression, use the inline TVF and pass the date and time column from the TVP as input to the function, like so:
DECLARE @PointsInTime AS TABLE( p VARCHAR(10) NOT NULL PRIMARY KEY, dt DATETIME2(0) NOT NULL UNIQUE);INSERT INTO @PointsInTime(p, dt) VALUES('T1', '2015-06-01 19:54:04'), ('T2', '2015-06-01 19:54:20'), ('T3', '2015-06-01 20:01:41'), ('T4', '2015-06-01 20:11:01'), ('T5', '2015-06-01 21:32:20');SELECT P.p, P.dt, E.mgridFROM @PointsInTime AS P OUTER APPLY dbo.EmployeeAt( 9, P.dt ) AS E;
This execution of this code generates the following output:
p dt mgrid---------- --------------------------- ----------T1 2015-06-01 19:54:04 NULLT2 2015-06-01 19:54:20 NULLT3 2015-06-01 20:01:41 7T4 2015-06-01 20:11:01 3T5 2015-06-01 21:32:20 4
For fun, here’s code showing the data pivoted into one row:
WITH C AS( SELECT P.p, E.mgrid FROM @PointsInTime AS P OUTER APPLY dbo.EmployeeAt( 9, P.dt ) AS E)SELECT *FROM C PIVOT( MAX(mgrid) FOR p IN (T1, T2, T3, T4, T5) ) AS PVT;
This code generates the following output:
T1 T2 T3 T4 T5---------- ---------- ---------- ---------- ----------NULL NULL 7 3 4
Conclusion
SQL Server 2016 introduces support for system-versioned temporal tables, which allow you to capture current and history states of the data based on the time modifications are executed. Hopefully in the future we’ll also see support for application-time period tables to allow the application (not the change time) to define the effective periods of the data, including recording future effective periods.
Since the feature is so new, we haven’t yet gathered a lot of performance-related information. Microsoft does provide some indexing guidelines, but there are currently no specialized indexes specifically for temporal data. (For details on what such indexes might look like, see: Interval Queries in SQL Server.) Depending on how many versions you will keep per row and what your queries will look like, perhaps the generic columnstore and rowstore indexes will do well. For example, columnstore segment elimination may end up doing a decent job at eliminating most of the noninteresting data. With time we’ll gain more experience and see.
Another small feature that perhaps we’ll see in the future is the ability to define the system period columns as implicitly hidden. DB2, for example, currently supports this feature. The idea is that when you issue a SELECT * against the temporal table, those columns won’t be returned, rather only when explicitly referred to. I also mentioned some other small things that are missing related to table expressions and correlations. I don’t want to focus on the half-empty glass and seem unappreciative of the efforts made by Microsoft. It is truly great to see the beginning of an investment in SQL Server in such an important area as temporal data.
About the Author
You May Also Like