TSQL Challenge: Packing Date and Time Intervals
Test your T-SQL querying skills by trying to solve a fundamental problem in the relational model known as packing intervals.
January 16, 2011
The relational model deals with date and time intervals and defines various operations on those. One of the fundamental operations is packing date and time intervals. Packing of intervals means that you want to merge all intervals that overlap into one contiguous interval. SQL Server doesn’t support a native interval type, so most represent an interval with two attributes holding the start and end points of the interval, or one of the points and a duration.
The challenge at hand involves addressing the task of packing intervals using TSQL efficiently, with a standard, set-based solution, that is reusable in other platforms—namely, use only standard SQL constructs.
I’ll provide both small and large sets of rows for sample data. Use the small set to test the validity of your solution, and the large one to test its optimality.
Here’s the DDL and sample data for the small set to test the validity of your solution:
-- DDL and sample data, small
SET NOCOUNT ON;
USE tempdb;
IF OBJECT_ID('dbo.Sessions') IS NOT NULL DROP TABLE dbo.Sessions;
CREATE TABLE dbo.Sessions
(
id INT NOT NULL IDENTITY(1, 1),
username VARCHAR(14) NOT NULL,
starttime DATETIME2(3) NOT NULL,
endtime DATETIME2(3) NOT NULL,
CONSTRAINT PK_Sessions PRIMARY KEY(id),
CONSTRAINT CHK_endtime_gteq_starttime
CHECK (endtime >= starttime)
);
GO
INSERT INTO dbo.Sessions VALUES('User1', '20111201 08:00:00.000', '20111201 08:30:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 08:30:00.000', '20111201 09:00:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 09:00:00.000', '20111201 09:30:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 10:00:00.000', '20111201 11:00:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 10:30:00.000', '20111201 12:00:00.000');
INSERT INTO dbo.Sessions VALUES('User1', '20111201 11:30:00.000', '20111201 12:30:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 08:00:00.000', '20111201 10:30:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 08:30:00.000', '20111201 10:00:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 09:00:00.000', '20111201 09:30:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 11:00:00.000', '20111201 11:30:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 11:32:00.000', '20111201 12:00:00.000');
INSERT INTO dbo.Sessions VALUES('User2', '20111201 12:04:00.000', '20111201 12:30:00.000');
INSERT INTO dbo.Sessions VALUES('User3', '20111201 08:00:00.000', '20111201 09:00:00.000');
INSERT INTO dbo.Sessions VALUES('User3', '20111201 08:00:00.000', '20111201 08:30:00.000');
INSERT INTO dbo.Sessions VALUES('User3', '20111201 08:30:00.000', '20111201 09:00:00.000');
INSERT INTO dbo.Sessions VALUES('User3', '20111201 09:30:00.000', '20111201 09:30:00.000');
GO
The table holds information about user sessions against some service or application. Suppose that for billing purposes you need to pack the intervals for each user since you’re not supposed to charge the user multiple times for multiple concurrent sessions, but rather only for connection time regardless of number of sessions. The desired result for the given sample data is:
username starttime endtime
--------- ----------------------- -----------------------
User1 2011-12-01 08:00:00.000 2011-12-01 09:30:00.000
User1 2011-12-01 10:00:00.000 2011-12-01 12:30:00.000
User2 2011-12-01 08:00:00.000 2011-12-01 10:30:00.000
User2 2011-12-01 11:00:00.000 2011-12-01 11:30:00.000
User2 2011-12-01 11:32:00.000 2011-12-01 12:00:00.000
User2 2011-12-01 12:04:00.000 2011-12-01 12:30:00.000
User3 2011-12-01 08:00:00.000 2011-12-01 09:00:00.000
User3 2011-12-01 09:30:00.000 2011-12-01 09:30:00.000
To test the performance of your solution, use the following code:
-- helper function GetNums
IF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL
DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
SELECT TOP (@n) n FROM Nums ORDER BY n;
GO
-- sample data, 5,000,000 rows
DECLARE
@num_users AS INT = 1000,
@intervals_per_user AS INT = 5000,
@start_period AS DATETIME2(3) = '20110101',
@end_period AS DATETIME2(3) = '20110107',
@max_duration_in_ms AS INT = 3600000; -- 60 nimutes
TRUNCATE TABLE dbo.Sessions;
WITH C AS
(
SELECT 'User' + RIGHT('000000000' + CAST(U.n AS VARCHAR(10)), 10) AS username,
DATEADD(ms, ABS(CHECKSUM(NEWID())) % 86400000,
DATEADD(day, ABS(CHECKSUM(NEWID())) % DATEDIFF(day, @start_period, @end_period), @start_period)) AS starttime
FROM dbo.GetNums(@num_users) AS U
CROSS JOIN dbo.GetNums(@intervals_per_user) AS I
)
INSERT INTO dbo.Sessions WITH (TABLOCK) (username, starttime, endtime)
SELECT username, starttime,
DATEADD(ms, ABS(CHECKSUM(NEWID())) % (@max_duration_in_ms + 1), starttime) AS endtime
FROM C;
This code fills the table with 5,000,000 rows.
Feel free to create indexes to support your solution.
Good luck!
BG
About the Author
You May Also Like