Insight and analysis on the information technology space from industry thought leaders.
T-SQL Feature Request: Add RESET WHEN Clause to Reset Window Partition
In this article, I present a couple of classic, representative T-SQL querying tasks, provide the solutions that are used currently, and explain how you would handle them with improved solutions using the RESET WHEN clause. Hopefully, by the end of this article you will be convinced of the importance of this feature and will add your vote to the connect item requesting it.
July 11, 2017
Window functions allow you to solve a wide variety of T-SQL querying tasks elegantly and efficiently. Still, as of SQL Server 2017, there are some tasks that are difficult to solve with efficient set-based solutions that could be handled easily and efficiently if T-SQL added support for a windowing clause called RESET WHEN. This clause resets the window partition when a certain condition—possibly one that is based on a window function—is met. This feature isn’t standard, but is currently supported by Teradata (see documentation article on the feature here). Thanks to Microsoft Data Platform MVP Alejandro Mesa for introducing the feature to me. In this article, I present a couple of classic, representative T-SQL querying tasks, provide the solutions that are used currently, and explain how you would handle them with improved solutions using the RESET WHEN clause. Hopefully, by the end of this article you will be convinced of the importance of this feature and will add your vote to the connect item requesting it.
Depleting Quantities
I’ll start with a challenge involving the computation of running totals that need to be depleted when a certain condition is met. Microsoft Data Platform MVP Geri Reshef presented the original challenge; later, Sharon Rimer of Naya Technologies presented a variation of the challenge.
The sample data for this challenge involves a table called Transactions, which you create and populate using the following code:
SET NOCOUNT ON;USE tempdb;DROP TABLE IF EXISTS dbo.Transactions;GOCREATE TABLE dbo.Transactions( txid INT NOT NULL CONSTRAINT PK_Transactions PRIMARY KEY, qty INT NOT NULL);GOTRUNCATE TABLE dbo.Transactions;INSERT INTO dbo.Transactions(txid, qty) VALUES(1,2),(2,5),(3,4),(4,1),(5,10),(6,3), (7,1),(8,2),(9,1),(10,2),(11,1),(12,9);
The transactions keep adding quantities of some item to a container based on txid ordering. As soon as the cumulative quantity exceeds the container’s capacity (given as input), the container needs to be depleted. Your solution needs to show the state of the container (total quantity) after every transaction, with 0 as the total quantity after depletion, as well as the depletion quantity when relevant. Here’s the desired result for the given sample data and an input container capacity of 5:
txid qty totalqty depletionqty----------- ----------- ----------- ------------1 2 2 02 5 0 73 4 4 04 1 5 05 10 0 156 3 3 07 1 4 08 2 0 69 1 1 010 2 3 011 1 4 012 9 0 13
Solution Based on Cursor
It’s common for people to try and solve such tasks using recursive queries. Such solutions can be elegant, but aren’t very efficient. People also solve such tasks using a method known as “quirky update,” which is very efficient, but not guaranteed since it relies on physical processing order. So far, I haven’t found efficient, guaranteed, set-based solutions for this task, and instead have been reluctantly using iterative solutions (T-SQL- or CLR-based).
Here’s an example for a simple iterative T-SQL solution using a cursor:
SET NOCOUNT ON;DECLARE @maxallowedqty AS INT = 5;DECLARE @C AS CURSOR, @txid AS INT, @qty AS INT, @totalqty AS INT = 0, @depletionqty AS INT = 0;DECLARE @Result AS TABLE( txid INT NOT NULL PRIMARY KEY, qty INT NOT NULL, totalqty INT NOT NULL, depletionqty INT NOT NULL);SET @C = CURSOR FORWARD_ONLY STATIC READ_ONLY FOR SELECT txid, qty FROM dbo.Transactions ORDER BY txid;OPEN @C;FETCH NEXT FROM @C INTO @txid, @qty;WHILE @@FETCH_STATUS = 0BEGIN SELECT @totalqty += @qty, @depletionqty = 0; IF @totalqty > @maxallowedqty BEGIN SET @depletionqty = @totalqty; SET @totalqty = 0; END; INSERT INTO @Result(txid, qty, totalqty, depletionqty) VALUES(@txid, @qty, @totalqty, @depletionqty); FETCH NEXT FROM @C INTO @txid, @qty;END;SELECT txid, qty, totalqty, depletionqtyFROM @ResultORDER BY txid;
The code defines a cursor variable and uses it to fetch the transactions one at a time in chronological order. It accumulates the quantities in a variable called @totalqty. After every row fetched, the code checks if the cumulative quantity exceeds the container capacity, in which case it sets a variable called @depletionqty to the current @totalqty value, and then zeroes the current @totalqty value. The code then writes the current transaction info (txid and qty) along with the current @totalqty and @depletionqty values to a table variable. Once done iterating through all transactions, the code queries the table variable to produce the desired result.
Solution Using RESET WHEN (Not Supported as of SQL Server 2017)
The downsides of using iterative solutions are well known. The question is always whether a descent set-based alternative exists. So far, I haven’t found one myself for the task at hand with the existing tools that we have in T-SQL, and would love to see one created some day. As mentioned, Teradata supports a windowing clause called RESET WHEN that resets the window partition when a certain condition is met. The beauty in this clause is that the condition can use a window function, so you can ask about what accumulated until the previous row. With our task you would reset the window partition when the sum of the quantities from the beginning of the partition and until the previous row exceeds the input container limit, like so:
(Remember, this code currently isn’t supported in SQL Server.)
DECLARE @maxallowedqty AS INT = 5;SELECT txid, qty, SUM(qty) OVER(ORDER BY txid RESET WHEN -- reset window partition when -- running sum until previous row > @maxallowedqty SUM(qty) OVER(ORDER BY txid ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) > @maxallowedqty ROWS UNBOUNDED PRECEDING) AS runsumFROM dbo.Transactions;
Had this code been supported, it would have produced the following output:
txid qty runsum ----------- ----------- -----------1 2 2 2 5 7 -----------------------------------3 4 4 4 1 5 5 10 15 -----------------------------------6 3 3 7 1 4 8 2 6 -----------------------------------9 1 1 10 2 3 11 1 4 12 9 13
As you can see, the window partition is reset after transaction IDs 2, 5 and 8.
To produce the final desired result, you set the total container quantity (call it totalqty) to zero when the running sum exceeds the container limit, and to the running sum otherwise. You then compute the depletion quantity (call it depletionqty) as the running sum minus the total quantity. Here’s the complete solution’s code:
DECLARE @maxallowedqty AS INT = 5;WITH C AS( SELECT *, SUM(qty) OVER(ORDER BY txid RESET WHEN SUM(qty) OVER(ORDER BY txid ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) > @maxallowedqty ROWS UNBOUNDED PRECEDING) AS runsum FROM dbo.Transactions)SELECT txid, qty, totalqty, runsum - totalqty AS depletionqtyFROM C CROSS APPLY ( VALUES( CASE WHEN runsum > @maxallowedqty THEN 0 ELSE runsum END ) ) AS A(totalqty);
As you can see, the solution is simple, concise and elegant!
Recently, Sharon Rimer of Naya Technologies introduced a variation of this challenge based on one of the company's customer cases where the desired result was to compute how many times the container exceeded the input limit. To handle this need, you would keep the same definition for the CTE C, and use the following outer query to compute the count:
SELECT COUNT( CASE WHEN runsum > @maxallowedqty THEN 1 END ) AS timesexceededFROM C;
The complete solution would look like this:
WITH C AS( SELECT *, SUM(qty) OVER(ORDER BY txid RESET WHEN SUM(qty) OVER(ORDER BY txid ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) > @maxallowedqty ROWS UNBOUNDED PRECEDING) AS runsum FROM dbo.Transactions)SELECT COUNT( CASE WHEN runsum > @maxallowedqty THEN 1 END ) AS timesexceededFROM C;
Again, short and elegant.
Complex Islands
There are many other challenges that perhaps do currently have a reasonable T-SQL set-based solution, but that can be solved more easily and elegantly with the RESET WHEN clause. A good example is handling complex islands tasks where you need to define a new island whenever a condition that compares something from the current row and something from the previous row is met. To demonstrate such a challenge, I’ll use a table called Stocks, which you create and populate by running the following code:
SET NOCOUNT ON;USE tempdb;DROP TABLE IF EXISTS dbo.StockRates;GOCREATE TABLE dbo.StockRates( stockid INT NOT NULL, dt DATE NOT NULL, val INT NOT NULL, CONSTRAINT PK_StockRates PRIMARY KEY(stockid, dt));GOINSERT INTO dbo.StockRates VALUES (1, '2017-08-01', 13), (1, '2017-08-02', 14), (1, '2017-08-03', 17), (1, '2017-08-04', 40), (1, '2017-08-05', 45), (1, '2017-08-06', 52), (1, '2017-08-07', 56), (1, '2017-08-08', 60), (1, '2017-08-09', 70), (1, '2017-08-10', 30), (1, '2017-08-11', 29), (1, '2017-08-12', 35), (1, '2017-08-13', 40), (1, '2017-08-14', 45), (1, '2017-08-15', 60), (1, '2017-08-16', 60), (1, '2017-08-17', 55), (1, '2017-08-18', 60), (1, '2017-08-19', 20), (1, '2017-08-20', 15), (1, '2017-08-21', 20), (1, '2017-08-22', 30), (1, '2017-08-23', 40), (1, '2017-08-24', 20), (1, '2017-08-25', 60), (1, '2017-08-26', 80), (1, '2017-08-27', 70), (1, '2017-08-28', 70), (1, '2017-08-29', 40), (1, '2017-08-30', 30), (1, '2017-08-31', 10), (2, '2017-08-01', 3), (2, '2017-08-02', 4), (2, '2017-08-03', 7), (2, '2017-08-04', 30), (2, '2017-08-05', 35), (2, '2017-08-06', 42), (2, '2017-08-07', 46), (2, '2017-08-08', 50), (2, '2017-08-09', 60), (2, '2017-08-10', 20), (2, '2017-08-11', 19), (2, '2017-08-12', 25), (2, '2017-08-13', 30), (2, '2017-08-14', 35), (2, '2017-08-15', 50), (2, '2017-08-16', 50), (2, '2017-08-17', 45), (2, '2017-08-18', 50), (2, '2017-08-19', 10), (2, '2017-08-20', 5), (2, '2017-08-21', 10), (2, '2017-08-22', 20), (2, '2017-08-23', 30), (2, '2017-08-24', 10), (2, '2017-08-25', 50), (2, '2017-08-26', 70), (2, '2017-08-27', 60), (2, '2017-08-28', 60), (2, '2017-08-29', 30), (2, '2017-08-30', 20), (2, '2017-08-31', 1);
In this table, you keep track of daily closing stock rates. Say you need to identify periods (islands) where the stock value is greater than or equal to 50, and for each period you need to show when it started and ended, as well as what the maximum stock rate was during that period. The catch is that you need to tolerate (ignore) gaps of up to 6 days. Here’s the desired result for the given sample data:
stockid startdate enddate maxvalue----------- ---------- ---------- -----------1 2017-08-06 2017-08-18 701 2017-08-25 2017-08-28 802 2017-08-08 2017-08-18 602 2017-08-25 2017-08-28 70
Observe, for example, that for the first island for stock ID 1 we ignored the gap between Aug. 9, 2017, and Aug. 15, 2017, since it wasn’t longer than 6 days, but we didn’t ignore the gap between Aug. 18, 2017, and Aug. 25, 2017, since it was 7-days long.
Currently Supported T-SQL Solution
As mentioned, there are currently supported set-based solutions for islands challenges like this one, but they are longer and more complex than how you would handle such tasks with the RESET WHEN clause. With one of the currently supported solutions, as the first step you compute a flag (call it isstart) that you set to 0 when it’s not the beginning of an island by comparing something from the current row and with something from the previous row (obtained with the LAG function). Otherwise, you set the flag to 1. In our case, after filtering only rows where the stock value is greater than or equal to 50, you set the flag to 0 when the difference between the previous date and the current date is less than 7 days; otherwise, you set the flag to 1. Here’s the code that implements this step:
SELECT stockid, dt, val, CASE WHEN DATEDIFF(day, LAG(dt) OVER(PARTITION BY stockid ORDER BY dt), dt) < 7 THEN 0 ELSE 1 END AS isstartFROM dbo.StockRatesWHERE val >= 50;
This code generates the following output:
stockid dt val isstart----------- ---------- ----------- -----------1 2017-08-06 52 11 2017-08-07 56 01 2017-08-08 60 01 2017-08-09 70 01 2017-08-15 60 01 2017-08-16 60 01 2017-08-17 55 01 2017-08-18 60 01 2017-08-25 60 11 2017-08-26 80 01 2017-08-27 70 01 2017-08-28 70 02 2017-08-08 50 12 2017-08-09 60 02 2017-08-15 50 02 2017-08-16 50 02 2017-08-18 50 02 2017-08-25 50 12 2017-08-26 70 02 2017-08-27 60 02 2017-08-28 60 0
As a second step, you produce an island identifier by computing the running sum of the isstart flag. Finally, you group the data by stockid and isstart, and return the start and end dates of the island, plus the maximum stock rate during the period. Here’s the complete solution’s code:
WITH C1 AS( SELECT *, CASE WHEN DATEDIFF(day, LAG(dt) OVER(PARTITION BY stockid ORDER BY dt), dt) < 7 THEN 0 ELSE 1 END AS isstart FROM dbo.StockRates WHERE val >= 50),C2 AS( SELECT *, SUM(isstart) OVER(PARTITION BY stockid ORDER BY dt ROWS UNBOUNDED PRECEDING) AS grp FROM C1)SELECT stockid, MIN(dt) AS startdate, MAX(dt) AS enddate, MAX(val) as maxvalueFROM C2GROUP BY stockid, grpORDER BY stockid, startdate;
Solution Using RESET WHEN (Not Supported as of SQL Server 2017)
Using the RESET WHEN clause, it would have been easier to solve the task since you can simply reset the window partition when the condition for starting a new island is met. You could then use the minimum date in the partition is the island identifier.
Here’s an example of how you would compute the island identifier (call it grp):
SELECT stockid, dt, val, MIN(dt) OVER(PARTITION BY stockid ORDER BY dt RESET WHEN DATEDIFF(day, MIN(dt) OVER( PARTITION BY stockid ORDER BY dt ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), dt) >= 7 ROWS UNBOUNDED PRECEDING) AS grpFROM dbo.StocksWHERE val >= 50;
Here the code uses the MIN window function to obtain the date from the previous row by using the window frame extent ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING. Alternatively, you could get the date from the previous row with the LAG function, like so (showing only the alternative RESET WHEN clause):
RESET WHEN DATEDIFF(day, LAG(dt) OVER(PARTITION BY stockid ORDER BY dt), dt) >= 7
Here’s what the output of this step would be, showing the computed group identifier for each island:
stockid dt val grp----------- ---------- ----------- -----------1 2017-08-06 52 2017-08-061 2017-08-07 56 2017-08-061 2017-08-08 60 2017-08-061 2017-08-09 70 2017-08-061 2017-08-15 60 2017-08-061 2017-08-16 60 2017-08-061 2017-08-17 55 2017-08-061 2017-08-18 60 2017-08-061 2017-08-25 60 2017-08-251 2017-08-26 80 2017-08-251 2017-08-27 70 2017-08-251 2017-08-28 70 2017-08-252 2017-08-08 50 2017-08-082 2017-08-09 60 2017-08-082 2017-08-15 50 2017-08-082 2017-08-16 50 2017-08-082 2017-08-18 50 2017-08-082 2017-08-25 50 2017-08-252 2017-08-26 70 2017-08-252 2017-08-27 60 2017-08-252 2017-08-28 60 2017-08-25
Then you would need just one more step to group this data by stockid and grp, and return the beginning and end of each period, plus the maximum stock rate within the period, like so:
WITH C AS( SELECT *, MIN(dt) OVER(PARTITION BY stockid ORDER BY dt RESET WHEN DATEDIFF(day, LAG(dt) OVER(PARTITION BY stockid ORDER BY dt), dt) >= 7 ROWS UNBOUNDED PRECEDING) AS grp FROM dbo.Stocks WHERE val >= 50)SELECT stockid, grp AS startdate, MAX(dt) AS enddate, MAX(val) as maxvalueFROM CGROUP BY stockid, grpORDER BY stockid, grp;
As you can see, this solution is shorter and simpler than the currently supported one.
Conclusion
Window functions are profound; there’s no question about it. Wouldn’t you like to see them becoming even more powerful? The RESET WHEN clause, if implemented in T-SQL, will doubtless increase its utility and enable us to replace existing iterative solutions with elegant and more efficient set-based ones. If you find that the addition of this feature would be beneficial to you, be sure to add your vote to the connect item requesting it.
About the Author
You May Also Like