Solutions to Packing Date and Time Intervals Puzzle

Itzik covers solutions to the Packing Date and Time Intervals puzzle, including a new solution that runs under 10 seconds with 5,000,000 rows.

ITPro Today

March 19, 2011

16 Min Read
ITPro Today logo in a gray background | ITPro Today

A couple of months ago I posted a puzzle involving packing date and time intervals. You can find the details of the puzzle here, and further clarifications here. I promised to post the solutions to the puzzle by the first day of spring, so here goes…

I provided both a small set of sample data to test the validity of the solution, and a large set to test its performance. Here’s the small set again:

SET NOCOUNT ON;USE tempdb;IF OBJECT_ID('dbo.Sessions') IS NOT NULL DROP TABLE dbo.Sessions;CREATE TABLE dbo.Sessions(  id        INT          NOT NULL IDENTITY(1, 1),  username  VARCHAR(14)  NOT NULL,  starttime DATETIME2(3) NOT NULL,  endtime   DATETIME2(3) NOT NULL,  CONSTRAINT PK_Sessions PRIMARY KEY(id),  CONSTRAINT CHK_endtime_gteq_starttime    CHECK (endtime >= starttime));-- indexesCREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);-- sample data (small)INSERT INTO dbo.Sessions VALUES  ('User1', '20111201 08:00:00.000', '20111201 08:30:00.000'),  ('User1', '20111201 08:30:00.000', '20111201 09:00:00.000'),  ('User1', '20111201 09:00:00.000', '20111201 09:30:00.000'),  ('User1', '20111201 10:00:00.000', '20111201 11:00:00.000'),  ('User1', '20111201 10:30:00.000', '20111201 12:00:00.000'),  ('User1', '20111201 11:30:00.000', '20111201 12:30:00.000'),  ('User2', '20111201 08:00:00.000', '20111201 10:30:00.000'),  ('User2', '20111201 08:30:00.000', '20111201 10:00:00.000'),  ('User2', '20111201 09:00:00.000', '20111201 09:30:00.000'),  ('User2', '20111201 11:00:00.000', '20111201 11:30:00.000'),  ('User2', '20111201 11:32:00.000', '20111201 12:00:00.000'),  ('User2', '20111201 12:04:00.000', '20111201 12:30:00.000'),  ('User3', '20111201 08:00:00.000', '20111201 09:00:00.000'),  ('User3', '20111201 08:00:00.000', '20111201 08:30:00.000'),  ('User3', '20111201 08:30:00.000', '20111201 09:00:00.000'),  ('User3', '20111201 09:30:00.000', '20111201 09:30:00.000');

You can use the following code to test the portability of your solution on Oracle:

CREATE TABLE dbo.Sessions(  id        INT          NOT NULL,  username  VARCHAR2(14) NOT NULL,  starttime TIMESTAMP    NOT NULL,  endtime   TIMESTAMP    NOT NULL,  CONSTRAINT PK_Sessions PRIMARY KEY(id),  CONSTRAINT CHK_endtime_gteq_starttime    CHECK (endtime >= starttime));-- indexesCREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);-- sample dataINSERT INTO dbo.Sessions VALUES(1, 'User1',   TO_DATE('20111201 08:00:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(2, 'User1',   TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(3, 'User1',   TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 09:30:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(4, 'User1',   TO_DATE('20111201 10:00:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 11:00:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(5, 'User1',   TO_DATE('20111201 10:30:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 12:00:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(6, 'User1',   TO_DATE('20111201 11:30:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 12:30:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(7, 'User2',   TO_DATE('20111201 08:00:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 10:30:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(8, 'User2',   TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 10:00:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(9, 'User2',   TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 09:30:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(10, 'User2',   TO_DATE('20111201 11:00:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 11:30:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(11, 'User2',   TO_DATE('20111201 11:32:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 12:00:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(12, 'User2',   TO_DATE('20111201 12:04:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 12:30:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(13, 'User3',   TO_DATE('20111201 08:00:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(14, 'User3',   TO_DATE('20111201 08:00:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(15, 'User3',   TO_DATE('20111201 08:30:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 09:00:00',  'YYYYMMDD HH24:MI:SS'));INSERT INTO dbo.Sessions VALUES(16, 'User3',   TO_DATE('20111201 09:30:00',  'YYYYMMDD HH24:MI:SS'),   TO_DATE('20111201 09:30:00',  'YYYYMMDD HH24:MI:SS'));

And here’s the large set:

-- helper function GetNumsIF OBJECT_ID('dbo.GetNums', 'IF') IS NOT NULL  DROP FUNCTION dbo.GetNums;GOCREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLEASRETURN  WITH    L0   AS(SELECT 1 AS c UNION ALL SELECT 1),    L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),    L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),    L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),    L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),    L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)  SELECT TOP (@n) n FROM Nums ORDER BY n;GO-- code to create and populate the table UsersIF OBJECT_ID('dbo.Users') IS NOT NULL DROP TABLE dbo.Users;CREATE TABLE dbo.Users(  username  VARCHAR(14)  NOT NULL,  CONSTRAINT PK_Users PRIMARY KEY(username));DECLARE @num_users AS INT = 2000;INSERT INTO dbo.Users(username)  SELECT 'User' + RIGHT('000000000' + CAST(U.n AS VARCHAR(10)), 10) AS username  FROM dbo.GetNums(@num_users) AS U;GO-- code to create and populate the table Sessions with 5,000,000 rowsDECLARE   @num_users          AS INT          = 1000,  @intervals_per_user AS INT          = 5000,  @start_period       AS DATETIME2(3) = '20110101',  @end_period         AS DATETIME2(3) = '20110107',  @max_duration_in_ms AS INT  = 3600000; -- 60 nimutes  TRUNCATE TABLE dbo.Sessions;WITH C AS(  SELECT 'User' + RIGHT('000000000' + CAST(U.n AS VARCHAR(10)), 10) AS username,      DATEADD(ms, ABS(CHECKSUM(NEWID())) % 86400000,        DATEADD(day, ABS(CHECKSUM(NEWID())) % DATEDIFF(day, @start_period, @end_period), @start_period)) AS starttime  FROM dbo.GetNums(@num_users) AS U    CROSS JOIN dbo.GetNums(@intervals_per_user) AS I)INSERT INTO dbo.Sessions WITH (TABLOCK) (username, starttime, endtime)  SELECT username, starttime,    DATEADD(ms, ABS(CHECKSUM(NEWID())) % (@max_duration_in_ms + 1), starttime) AS endtime  FROM C;

To remind you, your task is to pack each set of intervals that overlap or abut into one contiguous interval. The desired result for the small set of sample data is:

username  starttime               endtime--------- ----------------------- -----------------------User1     2011-12-01 08:00:00.000 2011-12-01 09:30:00.000User1     2011-12-01 10:00:00.000 2011-12-01 12:30:00.000User2     2011-12-01 08:00:00.000 2011-12-01 10:30:00.000User2     2011-12-01 11:00:00.000 2011-12-01 11:30:00.000User2     2011-12-01 11:32:00.000 2011-12-01 12:00:00.000User2     2011-12-01 12:04:00.000 2011-12-01 12:30:00.000User3     2011-12-01 08:00:00.000 2011-12-01 09:00:00.000User3     2011-12-01 09:30:00.000 2011-12-01 09:30:00.000

I’d like to thanks all those who sent solutions. Special thanks go to Alejandro Mesa for helping me verify the validity of my solutions and for their safekeeping. Here I’m going to cover only correct solutions (as far as I could tell). I will include also solutions that didn’t meet all of the puzzle’s requirements, like being standard, set-based solutions. Following are the solutions and their run times as measured on my Alienware M15x laptop with a Core i7 processor (8 logical CPUs), against hot cache.

Itzik Classic: 5,621 Seconds

The first one I want to present is the classic set-based solution that was available for some time:

-- indexes/*CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);CREATE INDEX idx_user_end_start ON dbo.Sessions(username, endtime, starttime);*/-- solutionWITH StartTimes AS(  SELECT DISTINCT username, starttime  FROM dbo.Sessions AS S1  WHERE NOT EXISTS    (SELECT * FROM dbo.Sessions AS S2     WHERE S2.username = S1.username       AND S2.starttime = S1.starttime)),EndTimes AS(  SELECT DISTINCT username, endtime  FROM dbo.Sessions AS S1  WHERE NOT EXISTS    (SELECT * FROM dbo.Sessions AS S2     WHERE S2.username = S1.username       AND S2.endtime > S1.endtime       AND S2.starttime = starttime) AS endtimeFROM StartTimes AS S;

The CTE StartTimes isolates packed interval start times using a query that returns all interval start times for which you cannot find any interval by the same user that started before the current interval start and ended on or after the current interval start. The EndTimes CTE isolates packed interval end times using a query that returns all interval end times for which you cannot find any interval by the same user that ended after the current interval end and started on or before the current interval end. The outer query than matches to each packed interval start the nearest packed interval end going forward by returning the minimum end that is greater than or equal to the current start. As you can see, the solution is very slow.

Geri Reshef: 1,690 Seconds

-- indexing/*CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);CREATE INDEX idx_user_end_start ON dbo.Sessions(username, endtime, starttime);*/-- solutionWith T1 As(  Select UserName, StartTime Time  From Sessions   Union All   Select UserName, EndTime   From Sessions), T2 As (  Select Row_Number() Over(Partition By UserName Order By Time) Nm, UserName, Time   FromT1), T3 As (  SelectA.Nm-Row_Number() Over(Partition By A.UserName Order By A.Time,B.Time) Nm1, A.UserName, A.Time StartTime, B.Time EndTime   FromT2 A     Inner join T2 B     On A.UserName = B.UserName   And A.Nm=B.Nm - 1   WhereExists     ( Select*   FromSessions S   WhereS.UserName = A.UserName   And (S.StartTime  A.Time) )  Or A.Time=B.Time)SelectUserName, Min(StartTime) StartTime, Max(EndTime) EndTime FromT3 Group By UserName, Nm1 Order By UserName, StartTime;

Ami Levin: 225 Seconds

-- Indexing/*CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);*/-- SolutionSELECT UserName, StartTime, EndTime,   ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY StartTime, EndTime) AS SerialINTO #MySessionsFROM dbo.Sessions;CREATE INDEX idx_serial_user_start_end ON #MySessions(serial, username, starttime, endtime);WITH Sessions_OrderedAS(  SELECT UserName, StartTime, EndTime, Serial  FROM #MySessions),Sessions_GroupsAS(  SELECT *, 0 AS Grouper  FROM Sessions_Ordered   WHERE Serial = 1    UNION ALL   SELECT B.UserName, B.StartTime,     CASE WHEN B.EndTime > A.EndTime THEN B.EndTime ELSE A.EndTime END,    B.Serial,     A.Grouper + CASE                  WHEN A.EndTime         

Alejandro Mesa: 130 Seconds

-- indexing/*CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);*/-- solutionSELECT    username,    starttime,    endtime,    ROW_NUMBER() OVER(PARTITION BY username ORDER BY starttime, endtime, id) AS rnINTO    #T1FROMdbo.[Sessions]CREATE UNIQUE CLUSTERED INDEX idx_#T1_idON #T1(username ASC, rn ASC);WITH rs AS (SELECT    username,    starttime,    endtime,    rnFROM#T1WHERE    rn = 1    UNION ALLSELECT    PRV.username,    CASE    WHEN R.overlaps = 1 THEN PRV.starttime    ELSE NXT.starttime    END,    CASE    WHEN R.overlaps = 1 AND PRV.endtime >= NXT.endtime THEN PRV.endtime    ELSE NXT.endtime    END,    NXT.rnFROMrs AS PRVINNER JOIN#T1 AS NXTON NXT.username = PRV.username AND NXT.rn = PRV.rn + 1CROSS APPLY(SELECT    CASE     WHEN PRV.starttime = NXT.starttime THEN 1    ELSE 0    END AS overlaps) AS R)SELECT    username,    starttime AS new_starttime,    MAX(endtime) AS new_endtimeFROMrsGROUP BY    username,    starttime--ORDER BY--    username,--    starttimeOPTION (MAXRECURSION 0);DROP TABLE #T1;

Alejandro’s description of the solution:

The solution assigns a consecutive number using ROW_NUMBER partitioned by (username) and ordered by (starttime, endtime, id), and dumps it into a table that will be used by a recursive cte. The anchor part select all rows where “rn = 1”, first row for each username, and the recursive part join the anchor with the next row in line “prv.username = nxt.username and nxt.rn = prv.rn + 1”. I also used the apply operator to return a column to identify when previous and next rows overlap, by using (prv.starttime = nxt.starttime). Then in the column list of the recursive part, I push down prv.username, prv.starttime if both rows overlap, and prv.endtime if the rows overlap and prv.endtime is greater than or equal to next.endtime, otherwise nxt.endtime.

The end result will have groups by (username, starttime), so the next step is to group by those columns and pull the maximum endtime.

Stefan: 40 Seconds

-- indexing/*CREATE UNIQUE INDEX idx_user_start ON dbo.Sessions(username, starttime, id);*/-- solutionif object_id('tempdb..#intervals') is not null drop table #intervals;with cte1 as (  select *, datediff(hour, 0, starttime)+n-1 as hash  from sessions    cross apply dbo.GetNums(datediff(hour, 0, endtime)-datediff(hour, 0, starttime)+1) t), cte2 as (  select distinct username, endtime  from sessions a  where not exists(    select *     from cte1 b    where a.username=b.username       and b.starttimea.endtime      and b.hash=datediff(hour, 0, a.endtime) ))select *, row_number() over (partition by username order by endtime) as rninto #intervalsfrom cte2-- add information about the start of each interval by locating the first session after each interval;with cte1 as (  select a.username, a.endtime, isnull(b.endtime,'19000101') as pe  from #intervals a    left join #intervals b      on a.username=b.username and a.rn=b.rn+1)select username,   (select min(b.starttime)   from sessions b    where a.username=b.username and b.starttime>a.pe) as starttime,  endtimefrom cte1 a

Stefan’s description of the solution:

Create a table with one row per interval. endtime is the end of each interval. The end of an interval is identified by finding the end of a session that is not overlapped by any other session. To speed up the search for overlapping sessions we only search the sessions that overlap the hour we are searching for.

Peter Larsson (Peso) 1: 32 Seconds

-- indexes/*CREATE UNIQUE INDEX idx_user_start ON dbo.Sessions(username, starttime, id);CREATE UNIQUE INDEX idx_user_end ON dbo.Sessions(username, endtime, id);*/-- Create staging tableCREATE TABLE#Stage(UserName VARCHAR(14) NOT NULL,StartTime DATETIME2(3) NOT NULL,EndTime DATETIME2(3) NOT NULL)-- Populate staging table with Gap informationINSERT#Stage(UserName,StartTime,EndTime)SELECTs.UserName,s.StartTime,e.EndTimeFROM(SELECTUserName,StartTime,DENSE_RANK() OVER (PARTITION BY UserName ORDER BY StartTime) - 1 AS rnStartFROMdbo.[Sessions]) AS sINNER JOIN(SELECTUserName,EndTime,DENSE_RANK() OVER (PARTITION BY UserName ORDER BY EndTime) AS rnEndFROMdbo.[Sessions]) AS e ON e.UserName = s.UserNameAND e.rnEnd = s.rnStartWHEREe.EndTime         

Peter Larsson (Peso) 2: 23 Seconds

-- indexing/*CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);*/-- solution-- Make sure printed output in each iteration doesn't affect timingSET NOCOUNT ON-- Create a control table for holding the Islands found for each UserNameCREATE TABLE#Control(UserName VARCHAR(14) NOT NULL,-- Current UserNameIslandID INT NOT NULL,-- Current Island numberIslandStartTime DATETIME2(3) NOT NULL,-- Starting time for current IslandExtensionStartTime DATETIME2(3) NOT NULL,-- Latest extension StartTime of current IslandExtensionEndTime DATETIME2(3) NOT NULL-- Latest extension EndTime of current Island)-- Create a helper index to speed up island detectionCREATE UNIQUE NONCLUSTERED INDEX UX_Control ON #Control (UserName, IslandID) INCLUDE (IslandStartTime, ExtensionStartTime, ExtensionEndTime)-- Insert initial starting Island for each UserNameINSERT#Control(UserName,IslandID,IslandStartTime,ExtensionStartTime,ExtensionEndTime)SELECTusr.UserName,1 AS IslandID,curr.StartTime AS IslandStartTime,curr.StartTime AS ExtensionStartTime,curr.EndTime AS ExtensionEndTimeFROMdbo.[Users] AS usrCROSS APPLY(SELECT TOP(1)w.StartTime,w.EndTimeFROMdbo.[Sessions] AS wWHEREw.UserName = usr.UserNameORDER BYw.StartTime) AS curr(StartTime, EndTime)-- Repeat loop when an Island is found or extended, quit when no more Island is found.WHILE @@ROWCOUNT > 0MERGE#Control AS tgtUSING(SELECTctrl.UserName,CASEWHEN curr.EndTime IS NULL THEN ctrl.IslandID + 1ELSE ctrl.IslandIDEND AS IslandID,ctrl.ExtensionEndTime AS ExtensionStartTime,curr.EndTime AS ExtensionEndTime,nxt.StartTime AS IslandStartTime,nxt.EndTime AS IslandEndTimeFROM#Control AS ctrl-- For every interval starting in current Island extension, get the interval that ends last (beyond extension).OUTER APPLY(SELECTMAX(w.EndTime) AS EndTimeFROMdbo.[Sessions] AS w WITH (index = idx_user_start_end)WHEREw.UserName = ctrl.UserNameAND w.StartTime BETWEEN ctrl.ExtensionStartTime AND ctrl.ExtensionEndTimeAND w.EndTime > ctrl.ExtensionEndTime) AS curr(EndTime)-- Check to see if an interval exists that is not an extension to current IslandOUTER APPLY(SELECT TOP(1)w.StartTime,w.EndTimeFROMdbo.[Sessions] AS wWHEREw.UserName = ctrl.UserNameAND w.StartTime > ctrl.ExtensionEndTimeORDER BYw.StartTime) AS nxt(StartTime, EndTime)-- Only work with the latest Island for each UserNameCROSS APPLY(SELECTMAX(c.IslandID) AS IslandIDFROM#Control AS cWHEREc.UserName = ctrl.UserName) AS isl(IslandID)WHEREctrl.IslandID = isl.IslandIDAND (curr.EndTime IS NOT NULL OR nxt.EndTime IS NOT NULL)) AS src ON src.UserName = tgt.UserNameAND src.IslandID = tgt.IslandIDWHENMATCHED-- An Island is extendedTHENUPDATESETtgt.ExtensionStartTime = src.ExtensionStartTime,tgt.ExtensionEndTime =src.ExtensionEndTimeWHENNOT MATCHED BY TARGET-- An new Island is foundTHENINSERT(UserName,IslandID,IslandStartTime,ExtensionStartTime,ExtensionEndTime)VALUES(src.UserName,src.IslandID,src.IslandStartTime,src.IslandStartTime,src.IslandEndTime);-- Present the result in correct orderSELECTUserName,IslandStartTime AS StartTime,ExtensionEndTime AS EndTimeFROM#ControlORDER BYUserName,IslandID-- Clean upDROP TABLE#Control

Itzik New 1: 17 Seconds

-- indexes /*CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);*/WITH C1 AS-- let e = end ordinals, let s = start ordinals(  SELECT id, username, starttime AS ts, +1 AS type, NULL AS e,    ROW_NUMBER() OVER(PARTITION BY username ORDER BY starttime, id) AS s  FROM dbo.Sessions  UNION ALL  SELECT id, username, endtime AS ts, -1 AS type,     ROW_NUMBER() OVER(PARTITION BY username ORDER BY endtime, id) AS e,    NULL AS s  FROM dbo.Sessions),C2 AS-- let se = start or end ordinal, namely, how many events (start or end) happened so far(  SELECT C1.*, ROW_NUMBER() OVER(PARTITION BY username ORDER BY ts, type DESC, id) AS se  FROM C1),C3 AS-- For start events, the expression s - (se - s) - 1 represents how many sessions were active-- just before the current (hence - 1)---- For end events, the expression (se - e) - e represents how many sessions are active-- right after this one---- The above two expressions are 0 exactly when a group of packed intervals -- either starts or ends, respectively---- After filtering only events when a group of packed intervals either starts or ends,-- group each pair of adjacent start/end events(  SELECT username, ts,     FLOOR((ROW_NUMBER() OVER(PARTITION BY username ORDER BY ts) - 1) / 2 + 1) AS grpnum  FROM C2  WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0)SELECT username, MIN(ts) AS starttime, max(ts) AS endtimeFROM C3GROUP BY username, grpnum;

The technique I used in this solution to calculate the number of active sessions was inspired by a similar technique used by Ben Flanaghan to address a problem called Maximum Concurrent Sessions that I covered in the past in the magazine.

Muhammad Al Pasha: 14 Seconds

-- indexes/*CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);CREATE INDEX idx_user_end_start ON dbo.Sessions(username, endtime, starttime);*/IF OBJECT_ID(N'tempdb..#EndTimes') IS NOT NULL   DROP TABLE #EndTimes;/*   Calculate end time of intervals, and generate row numbers per user,   and save the result to temp table to enhance performance.*/CREATE TABLE #EndTimes(   rownum INT NOT NULL,   username VARCHAR(14) NOT NULL,   endtime DATETIME2(3) NOT NULL);INSERT INTO #EndTimes(rownum, username, endtime)   SELECT ROW_NUMBER() OVER(PARTITION BY S1.username                                ORDER BY S1.endtime) AS rownum,          S1.username, S1.endtime     FROM dbo.Sessions AS S1          OUTER APPLY          (SELECT 1 AS is_overlapped            WHERE EXISTS(SELECT *                           FROM dbo.Sessions AS S2 WITH (index = idx_user_end_start)                          WHERE S2.username = S1.username                            AND S2.starttime  S1.endtime)) AS O2    WHERE O2.is_overlapped IS NULL    GROUP BY S1.username, S1.endtime; -- Eliminate duplicates/*   Calculate start time of intervals based on the previous interval end time,   then join the result with end time of intervals based on rownum and username.*/WITH StartTimesCTE AS(   SELECT ET.rownum, ET.username, ST.starttime     FROM (-- Use min possible time as the start time of the first interval for each user.           SELECT 1 AS rownum, username, CAST('00010101 00:00:00.000' AS DATETIME2(3)) AS endtime             FROM dbo.Users            UNION ALL           -- Use the already calculated end time of intervals for the rest of intervals.           SELECT rownum + 1, username, endtime             FROM #EndTimes) AS ET          CROSS APPLY          (--start time of interval is the min start time greater than the previous interval end time           SELECT MIN(ST.starttime) AS starttime             FROM dbo.Sessions AS ST            WHERE ST.username = ET.username              AND ST.starttime > ET.endtime) AS ST)-- The end result is just matching starttime with endtime of each interval based on row numbers (so it is based on time order)SELECT ST.username, ST.starttime, ET.endtime  FROM StartTimesCTE AS ST       INNER JOIN       #EndTimes AS ET       ON ET.rownum = ST.rownum          AND ET.username = ST.username;

       

Peter Larsson (Peso) 3: 4 Seconds

-- indexes/*CREATE INDEX idx_user_start_end ON dbo.Sessions(username, starttime, endtime);CREATE INDEX idx_user_end_start ON dbo.Sessions(username, endtime, starttime);*/;WITH cteSource(UserName, theTime, theGrp)AS (SELECTu.UserName,u.theTime,(DENSE_RANK() OVER (PARTITION BY u.UserName ORDER BY u.theTime) - 1) / 2 AS theGrp-- Create artificial groups depending-- on the times. Omit duplicates.FROM(-- Get all usernames and first start time and last endtimeSELECTUserName,MIN(StartTime) AS minStartTime,MAX(EndTime) AS maxEndTImeFROMdbo.SessionsGROUP BYUserName) AS usr-- This only get the intermediate gapsOUTER APPLY(SELECTs.StartTime,e.EndTimeFROM(-- Get all starttimes sortedSELECTs.StartTime,ROW_NUMBER() OVER (ORDER BY s.StartTime) AS SeqIDFROMdbo.Sessions AS sWHEREs.UserName = usr.UserName) AS sINNER JOIN(-- Get all endtimes sortedSELECTs.EndTime,ROW_NUMBER() OVER (ORDER BY s.EndTime) + 1 AS SeqIDFROMdbo.Sessions AS sWHEREs.UserName = usr.UserName) AS e ON e.SeqID = s.SeqID-- Match previous end time time against this starttimeWHEREe.EndTime         

Itzik New 2: 3 Seconds

-- indexes /*CREATE UNIQUE INDEX idx_user_start_id ON dbo.Sessions(username, starttime, id);CREATE UNIQUE INDEX idx_user_end_id ON dbo.Sessions(username, endtime, id);*/-- Inline Table Function Encapsulating Logic from Solution Itzik New 1 for Single UserIF OBJECT_ID('dbo.UserIntervals', 'IF') IS NOT NULL DROP FUNCTION dbo.UserIntervals;GOCREATE FUNCTION dbo.UserIntervals(@user AS VARCHAR(14)) RETURNS TABLEASRETURN  WITH C1 AS  (    SELECT id, starttime AS ts, +1 AS type, NULL AS e,      ROW_NUMBER() OVER(ORDER BY starttime, id) AS s    FROM dbo.Sessions    WHERE username = @user    UNION ALL    SELECT id, endtime AS ts, -1 AS type,       ROW_NUMBER() OVER(ORDER BY endtime, id) AS e,      NULL AS s    FROM dbo.Sessions    WHERE username = @user  ),  C2 AS  (    SELECT C1.*, ROW_NUMBER() OVER(ORDER BY ts, type DESC, id) AS se    FROM C1  ),  C3 AS  (    SELECT ts,       FLOOR((ROW_NUMBER() OVER(ORDER BY ts) - 1) / 2 + 1) AS grpnum    FROM C2    WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0  )  SELECT MIN(ts) AS starttime, max(ts) AS endtime  FROM C3  GROUP BY grpnum;GO-- SolutionSELECT U.username, A.starttime, A.endtimeFROM dbo.Users AS U  CROSS APPLY dbo.UserIntervals(U.username) AS A;

What makes this solution so fast is the efficient use of parallelism. Note though that when testing it in different environments and with different arguments for number of users and intervals, I didn’t always get a parallel plan. If you’re not getting a parallel plan, it could be because the machine you’re using has fewer logical CPUs than 8. Just for test purposes, you can use the SQL Server service startup option -P8, which will cause SQL Server to use 8 schedulers like in an environment with 8 logical CPUs. The -P startup parameter is not an officially documented one, so use it just for test purposes to mimic a machine with a desired number of CPUs, not for production purposes. Also, I noticed that in some machines where I tested this code and didn’t get parallel plans, when changing the sample data to 2,000 users each with 2,500 intervals, instead of 1,000 by 5,000, I got parallel plans in more cases. Either way, this solution is still very fast even when using a serial plan.

For more details, I’m going to cover my new solutions extensively in the March edition of SQJ.

Cheers,

BG

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like