The Challenge Is on: Maintain a Custom Letter-Based Sequence
If you need to implement a custom sequence solution, and you want it to perform as well as possible, you need to ask yourself which guarantees the solution needs to provide versus which it doesn’t.
April 1, 2018
Recently I delivered an advanced T-SQL class in Europe, and one of my students, Nicolas Barreau of Avanade France SAS, presented a nice little puzzle. The task wasn’t very complex, but it was interesting and fun to work on, and I hope you will enjoy working on it, too.
The Challenge
The challenge Nicola presented was to maintain a custom, letter-based, cycling, gap-allowing, sequence, with values in the form LLL, where L is an alphabet letter in the range A through Z (26 symbols). Nicola’s company uses the sequence value as a match flag for settlement, for instance, between payment and invoice. The first value produced should be AAA, and it should advance as if the letters were digits in base 26, like so:
AAA
AAB
…
AAY
AAZ
ABA
ABB
…
ZZY
ZZZ
In total, the range gives you 17,576 unique values. Once the sequence reaches the maximum value ZZZ, it should cycle back to AAA. Other than for cycling purposes, the values need to be sequential, but they don’t have to be consecutive--that is, gaps are allowed for the sake of improved performance.
Your task is to develop a solution that performs as well as possible. You need to create a stored procedure to request a new sequence value and return it as an output parameter. The stored procedure should have the following header:
CREATE OR ALTER PROC dbo.NextValForMySequence
@val CHAR(3) OUTPUT
AS
GO
Use the following code to test your procedure:
DECLARE @myval AS CHAR(3);
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SELECT @myval;
The output, when run repeatedly, should be as follows:
AAA
AAB
...
AAY
AAZ
ABA
ABB
...
ZZY
ZZZ
AAA
AAB
...
Good luck!
Storing the Value as a Character String, Blocking Sequence
One approach to addressing the task is to store the last used value in a table using a character string representation, as it’s needed. The stored procedure that is supposed to return a new sequence value will update the row to advance the current value, and return the new one.
Here’s the code to create the table and populate it with the value ZZZ, which is the immediate value before (with support for cycling) the first one that you need to produce—AAA:
SET NOCOUNT ON;
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.MySequence;
GO
CREATE TABLE dbo.MySequence
(
val CHAR(3) NOT NULL
CONSTRAINT CHK_MySequence_val
CHECK (val COLLATE Latin1_General_BIN
LIKE '[A-Z][A-Z][A-Z]')
);
INSERT INTO dbo.MySequence(val) VALUES('ZZZ');
The next tip is provided courtesy of Paul White. Notice the use of binary collation for the column. The reason is that you only want to support uppercase, nonaccented letters A through Z. Using dictionary, case-insensitive order, the pattern [A-Z] represents the characters (in order):
AaªÁáàÀÂâäÄÃãåÅÆæbBCcÇçdDÐðEeÉéèÈÊêëËfFƒGghHIiÍíìÌÎîïÏjJKklLMmnNÑñOoºÓóòÒÔôöÖÕõøØœŒpPQqrRSsŠšßtTÞþ™UuÚúùÙÛûüÜvVWwxXYyÝýÿŸzZ.
If you use dictionary, case-sensitive order, the pattern represent the letters:
AªáÁàÀâÂäÄãÃåÅÆbBcCçÇdDðÐeEéÉèÈêÊëËfFƒgGhHiIíÍìÌîÎïÏjJkKlLmMnNñÑoOºóÓòÒôÔöÖõÕøØœŒpPqQrRsSšŠßtTþÞ™uUúÚ ùÙûÛüÜvVwWxXyYýÝÿŸzZ, just excluding a and æ. With binary collation, you get only the uppercase nonaccented letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ.
Here are the queries I used to verify matches with each of the collations (code to create the GetNums function can be found here):
SELECT STRING_AGG(CHAR(n), '')
WITHIN GROUP (ORDER BY CHAR(n) COLLATE Latin1_General_CI_AS) AS dictionary_case_insensitive
FROM dbo.GetNums(0, 255)
WHERE CHAR(n) COLLATE Latin1_General_CI_AS LIKE '[A-Z]';
SELECT STRING_AGG(CHAR(n), '')
WITHIN GROUP (ORDER BY CHAR(n) COLLATE Latin1_General_CS_AS) AS dictionary_case_sensitive
FROM dbo.GetNums(0, 255)
WHERE CHAR(n) COLLATE Latin1_General_CS_AS LIKE '[A-Z]';
SELECT STRING_AGG(CHAR(n), '')
WITHIN GROUP (ORDER BY CHAR(n) COLLATE Latin1_General_BIN) AS binary_order
FROM dbo.GetNums(0, 255)
WHERE CHAR(n) COLLATE Latin1_General_BIN LIKE '[A-Z]';
In order to produce the next value, our stored procedure needs to:
· Open a transaction
· Query the current character string value, obtaining an update lock on the row to serialize access to the sequence without triggering deadlocks
· Convert the string value to a numeric form
· Advance the numeric value by 1 (cycling if needed)
· Convert the new numeric value to a character form
· Update both the current value in the table and the output parameter to the new character string value
· Commit the transaction
Here’s the complete procedure’s code:
CREATE OR ALTER PROC dbo.NextValForMySequence
@val CHAR(3) OUTPUT
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN TRAN;
SET @val = (SELECT val FROM dbo.MySequence WITH (UPDLOCK));
DECLARE @intval AS INT =
(
(ASCII(SUBSTRING(@val, 1, 1)) - 65) * 26 * 26 +
(ASCII(SUBSTRING(@val, 2, 1)) - 65) * 26 +
(ASCII(SUBSTRING(@val, 3, 1)) - 65) + 1
) % 17576;
UPDATE dbo.MySequence
SET @val = val =
CHAR(65 + @intval / 26 / 26 % 26) +
CHAR(65 + @intval / 26 % 26) +
CHAR(65 + @intval % 26);
COMMIT TRAN;
GO
In order to convert the character string form of the sequence value to a number, you apply the following for each of the characters:
· Convert the letter to a value in the range 0 through 25 using the expression ASCII() - 65 (65 is the ascii value of the letter A).
· Multiply the result by 26 raised to the power of the zero-based character position from right to left.
You sum the results for the different characters, add 1 to produce the next sequence value, and apply modulo 17576 to achieve the cycling requirement. Here’s the expression that achieves this conversion:
(
(ASCII(SUBSTRING(@val, 1, 1)) - 65) * 26 * 26 +
(ASCII(SUBSTRING(@val, 2, 1)) - 65) * 26 +
(ASCII(SUBSTRING(@val, 3, 1)) - 65) + 1
) % 17576;
You then covert the new integer value to a character string using inverse logic, like so:
CHAR(65 + @intval / 26 / 26 % 26) +
CHAR(65 + @intval / 26 % 26) +
CHAR(65 + @intval % 26);
The operator / applies integer division and the operator % applies modulo (remainder of integer division).
Execute the following code repeatedly to test this procedure:
DECLARE @myval AS CHAR(3);
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SELECT @myval;
You get the following output:
AAA
AAB
...
AAY
AAZ
ABA
...
To test the performance of this procedure, run it in a loop 1,000,000 times:
SET NOCOUNT ON;
DECLARE @i AS INT = 1, @myval AS CHAR(3);
WHILE @i <= 1000000
BEGIN
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SET @i += 1;
END;
It took this code 96 seconds to complete on my system, meaning that it took 96 microseconds per execution in average.
Storing the Value as an Integer, Blocking Sequence
The conversion of the sequence value from a character string to a number and then back to a character string obviously has cost. A simpler and more optimal approach is to maintain the current value in the table as a number, and have the procedure advance it by 1 and apply the conversion to a string before returning the value. If you want to allow users to easily query the current value in the table in its character string form, you can have a computed column handle this. Here’s the new definition of the MySequence table to support this approach:
DROP TABLE IF EXISTS dbo.MySequence;
GO
CREATE TABLE dbo.MySequence
(
intval INT NOT NULL
CONSTRAINT CHK_MySequence_intval CHECK (intval BETWEEN 0 AND 17575),
val AS
CHAR(65 + intval / 26 / 26 % 26) +
CHAR(65 + intval / 26 % 26) +
CHAR(65 + intval % 26)
);
GO
INSERT INTO dbo.MySequence(intval) VALUES(17575);
SELECT val FROM dbo.MySequence;
You populate the table with the maximum integer value in the supported range so that the first time you’ll execute the procedure it will advance (with cycling) to 1. The SELECT query retrieving the computed column value returns the following output:
val
----
ZZZ
This makes the storage choice transparent to the user.
You then alter the stored procedure to apply simple integer addition to advance the value, like so:
CREATE OR ALTER PROC dbo.NextValForMySequence
@val CHAR(3) OUTPUT
AS
SET NOCOUNT, XACT_ABORT ON;
DECLARE @intval AS INT;
UPDATE dbo.MySequence
SET @intval = intval = (intval + 1) % 17576;
SET @val =
CHAR(65 + @intval / 26 / 26 % 26) +
CHAR(65 + @intval / 26 % 26) +
CHAR(65 + @intval % 26);
GO
Notice that since both the updating of the current value to the next, and the assignment of the new value to a variable are handle by a single UPDATE statement, there’s no need for an explicit transaction to avoid lost updates.
Run the following code repeatedly to test the procedure:
DECLARE @myval AS CHAR(3);
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SELECT @myval;
You get the following output:
AAA
AAB
...
AAY
AAZ
ABA
...
Test it in a loop, 1,000,000 times:
SET NOCOUNT ON;
DECLARE @i AS INT = 1, @myval AS CHAR(3);
WHILE @i <= 1000000
BEGIN
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SET @i += 1;
END;
It took this code 35 seconds to complete on my system, 35 microseconds per execution in average. That’s a third of the run time of the previous solution!
Since an UPDATE statement uses an exclusive lock, which is kept until the transaction finishes, this solution implements a transaction-level blocking sequence, guarantying no gaps. However, remember that this wasn’t a requirement in our case. If you don’t mind gaps, the transaction-level blocking behavior could result in an unnecessary performance penalty when using long transactions and obtaining a sequence value early in the transaction. To demonstrate this, open two connections (we’ll call them connection 1 and connection 2). Run the following code in connection 1 to open an explicit transaction and request a new sequence value:
SET NOCOUNT ON;
BEGIN TRAN;
DECLARE @myval AS CHAR(3);
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SELECT @myval;
The session obtains an exclusive lock on the row to apply the update, and keeps it since the transaction is still open. When I ran this code, I got the sequence value ABB. Make a note of the value you got. Next, run the following code in connection 2 to request a new value:
SET NOCOUNT ON;
USE tempdb;
DECLARE @myval AS CHAR(3);
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SELECT @myval;
Your request for a new sequence value is blocked since your session requests an exclusive lock on the row, and is blocked by the exclusive lock held by the other session. Your request will now need to wait until either connection 1 completes the transaction, or a timeout expires, if one was set.
To demonstrate how the no-gaps behavior is guaranteed, back in connection 1, run the following code to emulate a failure of the transaction by rolling it back:
ROLLBACK TRAN;
The update applied by connection 1 is undone (value is back to ABA in my case) and the exclusive lock is released. Connection 2 now manages to acquire the exclusive lock it was waiting for and advance the sequence value to ABB, which connection 1 ended up not using.
I ran a performance test using the OStress tool (part of the RML Utilities for SQL Server) to emulate multiple concurrent requests for a new sequence value made within an explicit transaction. The following code (executed in a command prompt) tests 100 concurrent individual requests:
OStress -S.SQL2017 -dtempdb -Q"SET NOCOUNT ON; BEGIN TRAN; DECLARE @myval AS CHAR(3); EXEC dbo.NextValForMySequence @val = @myval OUTPUT; WAITFOR DELAY '00:00:00.100'; COMMIT TRAN" -n100 -r1
Each T-SQL batch (Q parameter) opens a transaction, requests a new sequence value, applies a delay of 100 milliseconds emulating some work that needs to be done after the sequence value is obtained, and commits the transaction. Following is the description of the parameters used in this example:
S: SQL Server instance you connect to, d: database, Q: T-SQL batch, n: number of threads, r: number of rounds per thread. It took this code 10.479 seconds to complete on my system.
The following code tests 1,000 concurrent requests:
OStress -S.SQL2017 -dtempdb -Q"SET NOCOUNT ON; BEGIN TRAN; DECLARE @myval AS CHAR(3); EXEC dbo.NextValForMySequence @val = @myval OUTPUT; WAITFOR DELAY '00:00:00.100'; COMMIT TRAN" -n1000 -r1
It took this code 104.311 seconds to complete on my system.
If you do need to implement a transaction-level blocking sequence that guarantees no gaps, an important thing to remember is to postpone the request for a new sequence value to as late as possible within the transaction.
Nonblocking Integer Sequence
To eliminate the unnecessary delays that are due to the transaction-level blocking behavior of the previous solution, you need a nonblocking sequencing solution. You can achieve this by using a sequence object instead of a table to store the last used value. The way SQL Server implements the sequence object is as a nonblocking, gap-allowing solution. When you make a request for a new sequence value using the NEXT VALUE FOR FUNCTION, SQL Server locks the object only momentarily until it completes advancing the value, but then immediately release the lock to allow others to request new values, even if the transaction is still open.
Use the following code to define the sequence based on our needs, i.e., range 0 through 17575, cycling:
DROP TABLE IF EXISTS dbo.MySequence;
GO
DROP SEQUENCE IF EXISTS dbo.MySequence;
GO
CREATE SEQUENCE dbo.MySequence AS INT
MINVALUE 0
MAXVALUE 17575
CYCLE;
Next, alter the procedure to request the value from the sequence object, like so:
CREATE OR ALTER PROC dbo.NextValForMySequence
@val CHAR(3) OUTPUT
AS
SET NOCOUNT, XACT_ABORT ON;
DECLARE @intval AS INT = NEXT VALUE FOR dbo.MySequence;
SET @val =
CHAR(65 + @intval / 26 / 26 % 26) +
CHAR(65 + @intval / 26 % 26) +
CHAR(65 + @intval % 26);
GO
Use the following code repeatedly to test the procedure:
DECLARE @myval AS CHAR(3);
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SELECT @myval;
You get the following output:
AAA
AAB
...
AAY
AAZ
ABA
...
Use the following code to test 1,000,000 iterations:
SET NOCOUNT ON;
DECLARE @i AS INT = 1, @myval AS CHAR(3);
WHILE @i <= 1000000
BEGIN
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SET @i += 1;
END;
This code finished on my system in 13 seconds total—13 microseconds per execution. That’s a third of the run time of the previous solution, and a ninth of the run time of the original.
To demonstrate that this solution is a nonblocking one, again, open two connections. Run the following code in connection 1 to open a transaction and ask for a new sequence value:
SET NOCOUNT ON;
BEGIN TRAN;
DECLARE @myval AS CHAR(3);
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SELECT @myval;
Make a note of the sequence value that you got. I got ABB.
While the transaction in connection 1 is still open, tun the following code in connection 2 to request a new sequence value:
SET NOCOUNT ON;
DECLARE @myval AS CHAR(3);
EXEC dbo.NextValForMySequence
@val = @myval OUTPUT;
SELECT @myval;
This time the request isn’t blocked, rather you immediately get a new value. I got ABC.
Back in connection 1, run the following code to roll the transaction back:
ROLLBACK TRAN;
The value ABB is lost, and you have a gap between your used sequence values. However, if that’s acceptable, you do benefit from a better performing solution.
Test 100 concurrent requests using explicit transactions with the OStress tool:
OStress -S.SQL2017 -dtempdb -Q"SET NOCOUNT ON; BEGIN TRAN; DECLARE @myval AS CHAR(3); EXEC dbo.NextValForMySequence @val = @myval OUTPUT; WAITFOR DELAY '00:00:00.100'; COMMIT TRAN" -n100 -r1
This code finished on my system in 1.125 seconds (1/10 of the previous solution).
Test 1,000 concurrent requests:
OStress -S.SQL2017 -dtempdb -Q"SET NOCOUNT ON; BEGIN TRAN; DECLARE @myval AS CHAR(3); EXEC dbo.NextValForMySequence @val = @myval OUTPUT; WAITFOR DELAY '00:00:00.100'; COMMIT TRAN" -n1000 -r1
This test finished in 13.997 seconds on my system, compared to over 100 seconds with the previous solution.
Summary
If you need to implement a custom sequence solution, and you want it to perform as well as possible, you need to ask yourself which guarantees the solution needs to provide versus which it doesn’t. As I demonstrated in this article, you can control whether the solution will be a transaction-level blocking one, which guarantees no gaps, or not; however, there is a tradeoff. Also, try to keep things simple. Storing the sequence value as an integer makes the solution both simple and good performing. You handle the character string representation part, if needed, right before returning the value to the user.
About the Author
You May Also Like