T-SQL Puzzle – Grouping consecutive rows with a common element
Itzik provides a T-SQL puzzle. Ready for the challenge?
September 10, 2006
You will be working with a table called Attendance containing attendance
information of students in classes. Run the following code to create the
Attendance table and populate it with sample data:
SET NOCOUNT ON;USE tempdb;GOIF OBJECT_ID('dbo.Attendance', 'U') IS NOT NULL DROP TABLE dbo.Attendance;GOCREATE TABLE dbo.Attendance( student VARCHAR(10) NOT NULL, dt DATETIME NOT NULL, slot INT NOT NULL, attend BIT NOT NULL, PRIMARY KEY(student, dt, slot));INSERT INTO dbo.Attendance VALUES('A', '20060801', 1, 1);INSERT INTO dbo.Attendance VALUES('A', '20060801', 2, 1);INSERT INTO dbo.Attendance VALUES('A', '20060801', 3, 1);INSERT INTO dbo.Attendance VALUES('A', '20060802', 1, 1);INSERT INTO dbo.Attendance VALUES('A', '20060802', 2, 0);INSERT INTO dbo.Attendance VALUES('A', '20060802', 3, 1);INSERT INTO dbo.Attendance VALUES('A', '20060802', 4, 1);INSERT INTO dbo.Attendance VALUES('A', '20060803', 1, 1);INSERT INTO dbo.Attendance VALUES('A', '20060803', 2, 0);INSERT INTO dbo.Attendance VALUES('A', '20060803', 3, 0);INSERT INTO dbo.Attendance VALUES('B', '20060801', 1, 0);INSERT INTO dbo.Attendance VALUES('B', '20060801', 2, 1);INSERT INTO dbo.Attendance VALUES('B', '20060801', 3, 0);INSERT INTO dbo.Attendance VALUES('B', '20060802', 1, 0);INSERT INTO dbo.Attendance VALUES('B', '20060802', 2, 0);INSERT INTO dbo.Attendance VALUES('B', '20060802', 3, 0);INSERT INTO dbo.Attendance VALUES('B', '20060802', 4, 0);INSERT INTO dbo.Attendance VALUES('B', '20060803', 1, 1);INSERT INTO dbo.Attendance VALUES('B', '20060803', 2, 1);INSERT INTO dbo.Attendance VALUES('B', '20060803', 3, 0);
Contents of the Attendance table:
SELECT * FROM dbo.Attendance;student dt slot attend---------- ---------- ----------- ------A 2006-08-01 1 1A 2006-08-01 2 1A 2006-08-01 3 1A 2006-08-02 1 1A 2006-08-02 2 0A 2006-08-02 3 1A 2006-08-02 4 1A 2006-08-03 1 1A 2006-08-03 2 0A 2006-08-03 3 0B 2006-08-01 1 0B 2006-08-01 2 1B 2006-08-01 3 0B 2006-08-02 1 0B 2006-08-02 2 0B 2006-08-02 3 0B 2006-08-02 4 0B 2006-08-03 1 1B 2006-08-03 2 1B 2006-08-03 3 0
The attendance table contains a row for each student, date and time slot (a
session). In a given date there might be several time slots (sessions)
represented by their ordinals (1, 2, 3, ...). As you can see in the sample data,
different dates can have different numbers of sessions.
The attend column represents the attendance status (1 - student attended the
session, 0 - student did not attend the session).
Your task is to write a set-based query (no cursors) that groups consecutive
rows (order determined by dt, slot) with the same attendance status for each
student. For each group, you need to return a row with the following attributes:
student, from date, from slot, to date, to slot, attendance status, number of
sessions/slots in the group. The tricky part is that the time slots/sessions are
not represented by asingle attribute, rather by the combination of attributes
dt and slot.
For example, the following set of rows represents for student A a
consecutive period of time with the same attendance status:
student dt slot attend---------- ----------------------- ----------- ------A 2006-08-01 00:00:00.000 1 1A 2006-08-01 00:00:00.000 2 1A 2006-08-01 00:00:00.000 3 1A 2006-08-02 00:00:00.000 1 1
Here, student A had no attendance status change from date 2006-08-01, slot 1
through date 2006-08-02, slot 1.
So the output should show the following row for this group of sessions:
student from_dt from_slot to_dt to_slot attend cnt------- ---------- ---------- ---------- -------- ------ ----A 2006-08-01 1 2006-08-02 1 1 4
Description of output columns:
from_dt, from_slot: date and slot when period started
to_dt, to_slot: date and slot when period ended
attend: attendance status in the period (1 - student attended session,
0 - student did not attend session)
cnt: number of slots/sessions in the period
Here’s how the complete output should look like, sorted by student,
from_dt, from_slot:
student from_dt from_slot to_dt to_slot attend cnt------- ---------- ---------- ---------- -------- ------ ----A 2006-08-01 1 2006-08-02 1 1 4A 2006-08-02 2 2006-08-02 2 0 1A 2006-08-02 3 2006-08-03 1 1 3A 2006-08-03 2 2006-08-03 3 0 2B 2006-08-01 1 2006-08-01 1 0 1B 2006-08-01 2 2006-08-01 2 1 1B 2006-08-01 3 2006-08-02 4 0 5B 2006-08-03 1 2006-08-03 2 1 2B 2006-08-03 3 2006-08-03 3 0 1
Remember, no cursors. Besides the solution’s code, please provide some
narrative explaining the logic behind your solution.
You have a week to work on the solution. We will give away two prizes: one
winner will be chosen randomly out of those who provided correct solutions;
another winner will be chosen based on the fastest correct solution.
Performance will be measured against a table with around 100,000 rows
(100 students, 365 days, 3 slots a day, random attendance status). I will use
the following code to populate the table for performance measures:
TRUNCATE TABLE dbo.Attendance;INSERT INTO dbo.Attendance SELECT 'S' + RIGHT('00000000' + CAST(Students.n AS VARCHAR(9)), 9) AS student, DATEADD(day, Dates.n - 1, '20060101') AS dt, Slots.n AS slot, ABS(CHECKSUM(NEWID())) % 2 AS attend FROM dbo.Nums AS Students, dbo.Nums AS Dates, dbo.Nums AS Slots WHERE Students.n Nums is an auxiliary table of numbers containing a column called n with a
sequence of integers (1, 2, 3, ...).
You are not allowed to make design/schema changes to the Attendance table,
but you are allowed to create indexes as you see fit.
Enjoy, and good luck!
--
BG
About the Author
You May Also Like