Parsing Denormalized Strings with the STRING_SPLIT Function in SQL Server 2016
There is a new string parsing function available to you if you're dealing with SQL Server 2016 databases.
This Was Not the Article I Intended to Write
This article developed as I was in the process of writing another one on string functions. I had not run across the existence of the new STRING_SPLIT() function that was part of SQL Server 2016 until my research on what will be a couple of future articles. So long as your target database is in COMPATIBILITY_MODE = 130 you can take advantage of this function.
“What advantage is that?” You say? STRING_SPLIT() is a table-valued function which allows you to parse a delimited string passed in as a parameter (along with a separation character as a second parameter) returning the output column [value] as a table valued result set that presents as a row for each delimited value between each separator character.
STRING_SPLIT Syntax
The syntax for returning results from a function call to STRING_SPLIT is reminiscent of any call to a table-valued function:
SELECT valueFROM STRING_SPLIT( input string varchar(max), separator character varchar(1));
The input string parameter can be one of any of the following:
Delimited string value
varchar() typed variable assignment
Table column when using a CROSS APPLY
Examples for Returning Results from STRING_SPLIT()
Let’s take a look at how each of the aforementioned options for passing an input string parameter into STRING_SPLIT yields results:
Example One: Passing in a String Value:
This is by far the most straight-forward approach requiring simply the input string and the separator character. Using the following example should provide some clarity. Let’s see what happens when we input a list of airports I’ve visited last year as a sample:
SELECT value FROM STRING_SPLIT ('AZO,YYZ,SEA,PDX,SFO,RKV,AMS', ',');
value---------------------------AZOYYZSEAPDXSFORKVAMS
Of course the results are able to be filtered with predicates in the WHERE clause subject to sorting via the ORDER BY clause:
SELECT value FROM STRING_SPLIT ('AZO,YYZ,SEA,PDX,SFO,RKV,AMS', ',')WHERE value LIKE 'A%'ORDER BY value;
value---------------------------AMSAZO
Example Two: Normalizing Data using a Variable for the Delimited String… and a Cursor:
We can also declare a variable as varchar(n) or varchar(max) and pass that in as the first parameter and see similar results. This example uses the practical case of cleaning denormalized data from one table through parsing and inserting into a normalized table. I’ll make use of three tables:
[events] – storing event data related to SQL Cruise for up-coming events through 2018
[event_ports_denormalized] – storing an id column referencing the id column in [events] and a ports_list_csv column which stores a comma-separated list of each port for that event.
[event_ports] which will be a normalized version of the [event_ports_denormalized] table.
CREATE TABLE [events]( id INT IDENTITY(1,1),event_name VARCHAR(100));CREATE TABLE [event_ports_denormalized](id INT NOT NULL, ports_list_csv VARCHAR(500) NOT NULL);CREATE TABLE [event_ports](id INT NOT NULL, port_name VARCHAR(50) NOT NULL);
We also need to populate these tables to make sense of the results:
INSERT INTO [events](event_name)VALUES ('SQL Cruise Alaska 2017'), ('SQL Cruise Caribbean 2018'), ('SQL Cruise Alaska 2018') INSERT INTO [event_ports_denormalized](id, ports_list_csv) VALUES (1,'Seattle,Ketchikan,Juneau,Skagway,Victoria'), (2,'Miami,St. Thomas,Tortola,Nassau'), (3,'Seattle,Ketchikan,Juneau,Skagway,Victoria');
The tables’ contents looks like this:
[events]:
id event_name----------- --------------------------1 SQL Cruise Alaska 20172 SQL Cruise Caribbean 20183 SQL Cruise Alaska 2018
[event_ports_denormalized]:
id ports_list_csv----------- --------------------------------------------1 Seattle,Ketchikan,Juneau,Skagway,Victoria2 Miami,St. Thomas,Tortola,Nassau3 Seattle,Ketchikan,Juneau,Skagway,Victoria
If we wanted to look at a result set of ports and their associated events the resulting data doesn’t look conducive to further processing and joins with other tables in this state:
SELECT E.event_name, EPD.ports_list_csvFROM [events] AS EINNER JOIN [event_ports_denormalized] AS EPDON E.id = EPD.idORDER BY E.id;
event_name ports_list_csv----------------------------- ---------------------------------------------SQL Cruise Alaska 2017 Seattle,Ketchikan,Juneau,Skagway,VictoriaSQL Cruise Caribbean 2018 Miami,St. Thomas,Totola,NassauSQL Cruise Alaska 2018 Seattle,Ketchikan,Juneau,Skagway,Victoria
So it’s fair time we clean this data up so we can use it with other tables in the database. Let’s get to normalizing!
The approach we will take here is to use a cursor to assign each csv value and its associated id to variables to use STRING_SPLIT() to parse the values and insert into the [event_ports] table:
DECLARE @id INTDECLARE @csv VARCHAR(500)DECLARE splitting_cursor CURSOR FOR SELECT id, ports_list_csvFROM [event_ports_denormalized];OPEN splitting_cursorFETCH NEXT FROM splitting_cursor INTO @id, @csvWHILE @@FETCH_STATUS = 0BEGIN INSERT INTO [event_ports] (id, port_name)SELECT @id, SS.valueFROM STRING_SPLIT(@csv,',') AS SSFETCH NEXT FROM splitting_cursor INTO @id, @csvEND CLOSE splitting_cursor;DEALLOCATE splitting_cursor;Using the following query we can now see a normalized listing of events and the associated ports:SELECT E.event_name, EP.port_nameFROM [events] AS EINNER JOIN [event_ports] AS EPON E.id = EP.idORDER BY E.id;
event_name port_name----------------------------- ------------SQL Cruise Alaska 2017 SeattleSQL Cruise Alaska 2017 KetchikanSQL Cruise Alaska 2017 JuneauSQL Cruise Alaska 2017 SkagwaySQL Cruise Alaska 2017 VictoriaSQL Cruise Caribbean 2018 MiamiSQL Cruise Caribbean 2018 St. ThomasSQL Cruise Caribbean 2018 TotolaSQL Cruise Caribbean 2018 NassauSQL Cruise Alaska 2018 SeattleSQL Cruise Alaska 2018 KetchikanSQL Cruise Alaska 2018 JuneauSQL Cruise Alaska 2018 SkagwaySQL Cruise Alaska 2018 Victoria
Example Three: Using table values through a CROSS APPLY (Cursor-less):
I showed you the last process in order to show you how much simpler and lightweight it could be by using a CROSS APPLY. Let’s take the final step to using STRING_SPLIT() to normalize delimited data by a straight CROSS APPLY bypassing the cursor all together. I’ve truncated the [event_ports] table (take my word for it) and am ready to use the following process to reload the table:
INSERT INTO [event_ports] (id, port_name)SELECT id, SS.valueFROM [event_ports_denormalized] AS EPDCROSS APPLY STRING_SPLIT(EPD.ports_list_csv,',') AS SS;
The results are the same and run lighter with only 4 lines of code:
SELECT E.event_name, EP.port_nameFROM [events] AS E INNER JOIN [event_ports] AS EP ON E.id = EP.idORDER BY E.id;
event_name port_name----------------------------- ------------SQL Cruise Alaska 2017 SeattleSQL Cruise Alaska 2017 KetchikanSQL Cruise Alaska 2017 JuneauSQL Cruise Alaska 2017 SkagwaySQL Cruise Alaska 2017 VictoriaSQL Cruise Caribbean 2018 MiamiSQL Cruise Caribbean 2018 St. ThomasSQL Cruise Caribbean 2018 TotolaSQL Cruise Caribbean 2018 NassauSQL Cruise Alaska 2018 SeattleSQL Cruise Alaska 2018 KetchikanSQL Cruise Alaska 2018 JuneauSQL Cruise Alaska 2018 SkagwaySQL Cruise Alaska 2018 Victoria
NULL or Empty String?
One last thing I want to cover involves what happens when you have two delimiter characters positioned sequentially in the input string for the function. Take the following parameter as an example (noting the double comma sequence between Ketchikan and Juneau):
SELECT valueFROM STRING_SPLIT('Ketchikan,,Juneau,Skagway,Victoria',',');
value--------------KetchikanJuneauSkagwayVictoria(5 row(s) affected)
The value for the row returned for the sequential comma delimiter characters in treated as an empty string and not NULL.
Conclusion
Whether due to age or poorly-coded, many applications and nosql solutions often store comma-separated values (CSVs) in a single column/field. In many cases Microsoft SQL Server becomes involved in this denormalized data through either migration, ETL efforts, or some form of coordinated merge of system data that could benefit from using the STRING_SPLIT() function. The function is, essentially, a string parsing function that normalizes that which isn’t. This opens up new possibilities for easily normalizing data – whether internal to your SQL Server database or through interaction with an outside data source.
About the Author
You May Also Like