Splitting Data from One Column into Three Columns
Itzik Ben-Gan uses string-manipulation functions to turn one column of data into three columns.
August 31, 2002
EDITOR'S NOTE: Send your T-SQL questions to SQL Server MVP Itzik Ben-Gan at [email protected].
Data in one of my tables is formatted in the ID and Name columns that Figure 1 shows. I need to write T-SQL code that separates the data in the Name column into firstname, surname, and middlenames columns by splitting the names separated by commas. Is scripting a cursor the correct solution, and if so, how do I write one?
You don't have to use cursors for the solution. In fact, cursors should never be your first choice when you try to solve problems in T-SQL. In this case, you can write a single, set-based query by using the SUBSTRING() and CHARINDEX() string-manipulation functions and the CASE expression. First, let's create a sample table called Names and populate it with sample data, as Listing 1, page 14, shows. Next, assume that the first word in Figure 1's Name column is the surname, the second word is the first name, and anything else is the middle name or initial. The main tool I use here to separate the name elements is the CHARINDEX(str1, str2[, start]) function, which returns the position of str1 in str2, optionally starting the search from a given position. If CHARINDEX() doesn't find the string, it returns 0. Let's extract the surname first. The following expression finds the position of the first comma in the fullname column, subtracts 1, then uses the result in the LEFT() function to extract the surname:
LEFT(fullname, CHARINDEX(',', fullname + ',') - 1)
Note that I concatenated a comma to fullname in the string to be searched to accommodate a row in which the fullname column contains only the surname, which might be A.
Extracting the first name is a bit trickier. First, you must determine whether a first name exists. If a first name doesn't exist, your code should return NULL. A first name exists if the fullname column contains at least one comma. In the following CASE expression, you can use the REPLACE() function to replace all the commas in fullname with empty strings, then check that the length of the original fullname column minus the length of the new string is greater than 0:
CASE WHEN LEN(fullname) - LEN(REPLACE(fullName, ',', '')) > 0 THEN -- Calculate the first name. ELSE NULLEND
You can then extract the first name by using the expression that Listing 2 shows. Here's how the expression works. The SUBSTRING(str, start, numchars) function extracts a substring from str, starting at the position of a given start location, in the length of numchars. The start location is the position of the first comma plus 1. The number of characters to be extracted is calculated by subtracting the position of the first comma from the position of the second comma minus 1. The LTRIM() function removes any leading spaces.
To extract middle names, you use techniques similar to the ones you used to extract first names. First, you need to determine whether middle names exist by verifying that fullname contains more than one comma, as the following code shows:
CASE WHEN LEN(fullname) - LEN(REPLACE(fullname, ',', '')) > 1 THEN -- Calculate middle names. ELSE NULLEND
The expression in Listing 3 extracts middle names also by using the SUBSTRING() function. The start location is the position of the second comma plus 1. Earlier, I explained how the SUBSTRING(str, start, numchars) function works and the meaning of each parameter. If you provide input values to SUBSTRING() in the start and numchars parameters where the end of the string in the str parameter is exceeded, the expression generates no error. Instead, the function returns all the characters up to the end of the string. For example, SUBSTRING('ABC', 2, 10) means: "extract 10 characters starting with the second from 'ABC'." Instead of generating an error, SUBSTRING() returns BC. Bearing this in mind, instead of performing complex calculations to find the number of characters to extract, you can simply supply LEN(fullname) as the third argument. The middlenames portion of the string starts after the second comma and continues until the end of the string. Listing 4 shows the full query that extracts parts of fullname into the firstname, surname, and middlenames columns. Figure 2 shows the data that Listing 4's query generates.
About the Author
You May Also Like