COALESCE, ISNULL, and Data Types
Dysfunction Function: COALESCE, ISNULL, and Data Types
Dysfunction Function: COALESCE, ISNULL, and Data Types
The topic for this article sprung from a discussion with a coworker about which code construct to use to “sub-out” NULL values in a column with an actual value when encountered: COALESCE or ISNULL? At first blush it appears both are interchangeable. They both evaluate whether the value of a column is NULL and if it is will return a replacement value as output. As a basis of comparison I want to explain each in a standalone fashion before we move onto the crux of this article which is: you need to be aware of a very important concerns in the use of either.
COALESCE Explained
First of all COALESCE is not a function – it’s an expression. If I was to tell you the implication of that I’d ruin the punchline in this article so for now just keep this statement in the back of your head – we will revisit the reason why this is important at the end of the article.
Think of COALESCE as a shortcut for accomplishing the same thing as a CASE statement evaluating a one or more expressions for a NULL state. The first non-NULL argument in the series of argument is returned. For example the following CASE statement:
SELECT CASE WHEN middle_name IS NOT NULL THEN middle_nameWHEN middle_init IS NOT NULL THEN middle_initELSE 'Some Value'END AS FOOFROM SQL_Cruise.dbo.People;
Is functionally identical to the following COALESCE statement:
SELECT COALESCE(middle_name, middle_init, 'Some Value')FROM SQL_Cruise.dbo.People;
The syntax for the COALESCE expression is a series of arguments; each of which is an expression:
COALESCE(expression _n, expression_n+1, … expression_n+x);
The value of the first non-NULL argument is returned. If all arguments evaluate to NULL then COALESCE will return NULL.
ISNULL Explained
ISNULL on the other hand is a scalar function that accepts a single parameter for evaluating for NULL. If the value for that expression supplied as a parameter evaluates to NULL then the replacement value supplied as the second parameter is returned from the ISNULL() function. The syntax for the ISNULL function looks like a simplified COALESCE statement but with a supplied replacement value in case of NULL:
ISNULL(expression_to_evaluate, replacement_value);
What to be aware of when using ISNULL
I’m going to create a table called People using my SQL_Cruise database. I’m going to insert two records afterwards:
CREATE TABLE SQL_Cruise.dbo.People(first_name varchar(25) NOT NULL,middle_init varchar(1) NULL,middle_name varchar(25) NULL,last_name varchar(25) NOT NULL)INSERT INTO SQL_Cruise.dbo.People(first_name,middle_init,middle_name,last_name)VALUES ('Trevor', NULL, NULL, 'Ford'),('Austen', NULL, NULL, 'Ford');
Granted you’d likely not have a table where you’re collecting both middle initial as well as middle name but for this example it helps to make the issue quite obvious. The end result of the code to this point are two records that contain rows where both first_name and last_name are populated but NULLs exist for both middle_init and middle_name columns.
If we were then to employ two different methods for populating the middle name based upon the COALESCE expression and ISNULL function we would have something like the following:
--UPDATE USING COALESCE()UPDATE SQL_Cruise.dbo.PeopleSET middle_name = COALESCE(SQL_Cruise.dbo.People.middle_init,'Moon')WHERE first_name = 'Trevor';--UPDATE USING ISNULL()UPDATE SQL_Cruise.dbo.PeopleSET middle_name = ISNULL(SQL_Cruise.dbo.People.middle_init,'Robert')WHERE first_name = 'Austen';SELECT * FROM SQL_Cruise.dbo.People;
In both cases I use the middle_init column as the basis of determining if COALESCE or ISNULL does anything. Since this article is focused on the functional difference between both statements I’m only using two arguments in the COALESCE expression to match the functionality we would see coming from an ISNULL function call.
As explained earlier the cornerstone of each is that if the value of the first column supplied has a NULL value then return the value in the logic of the n+1 argument. In this exercise I use COALESCE to UPDATE the value of middle_name for the record for Trevor Ford if his middle_init value is NULL (which it is.) This results in his middle_name being updated to the supplied value of ‘Moon’.
Before you ask, yes, that is his real middle name.
Likewise with the record for Austen Ford I perform the same criteria check but use ISNULL for the function to perform the replacement in the UPDATE statement. This results in the supplied parameter value of ‘Robert ’ updating the existing NULL value for middle_name in his record.
Or does it?
Here is the puzzle to be explained: why did it only update his middle name to ‘R’ and not ‘Robert’ as supplied in the ISNULL(SQL_Cruise.dbo.People.middle_init,'Robert') function call?
I’ll give you a second to consider why…
Have you figured it out yet? What if I prompt you by stating the data type for middle_init is VARCHAR(1)?
When using the ISNULL function SQL Server applies the data type for the first parameter of the function as the data type for the returned value. COALESCE, being an expression aligned closely with the CASE statement does not do that. It follows the rules for the CASE statement which returns the highest precedence data type from all the expressions supplied as arguments in the COALESCE expression. For questions around data type precedence see the official documentation for Microsoft SQL Server here.
Some Concerns About COALESCE
Note that I’m not advocating always using COALESCE over ISNULL. That is not true. There are some issues that arise when using COALESCE that you need to consider and these revolve around implicit or explicit conversion and data type precedence. In order to explain I’m going to run through three simple exercises using the following scenarios:
Scenario One: string value --> integer column
CREATE TABLE SQL_Cruise.dbo.ints(id INT IDENTITY(1,1) NOT NULL,int_column INT NULL);INSERT INTO SQL_Cruise.dbo.ints(int_column)VALUES(8675309), (1001001), (NULL);SELECT ID, int_column FROM SQL_Cruise.dbo.intsORDER BY ID;UPDATE SQL_Cruise.dbo.intsSET int_column = COALESCE(int_column, '12345', 67890);SELECT ID, int_column FROM SQL_Cruise.dbo.intsORDER BY ID;
For the NULL record (ID = 3) the NULL value was replaced with the first non-NULL argument in the COALESCE expression. Even though the column being updated was typed as an INT the query completed successfully for two reasons: INT has a higher data type precedence than varchar and the value of ‘12345’ was implicitly converted to the INT value of 12345.
What happens if the string was alpha-numeric?
CREATE TABLE SQL_Cruise.dbo.ints(id INT IDENTITY(1,1) NOT NULL,int_column INT NULL);INSERT INTO SQL_Cruise.dbo.ints(int_column)VALUES(8675309), (1001001), (NULL);SELECT ID, int_column FROM SQL_Cruise.dbo.intsORDER BY ID;UPDATE SQL_Cruise.dbo.intsSET int_column = COALESCE(int_column, 'ABCDE', 67890);SELECT ID, int_column FROM SQL_Cruise.dbo.intsORDER BY ID;
Msg 245, Level 16, State 1, Line 51Conversion failed when converting the varchar value 'ABCDE' to data type int.
The statement could not be executed because you can’t convert a string value of ABCDE to an integer.
Scenario Two: string value --> bit column
CREATE TABLE SQL_Cruise.dbo.bits(id INT IDENTITY(1,1) NOT NULL,bits BIT NULL);INSERT INTO SQL_Cruise.dbo.bits(bits)VALUES(1), (0), (NULL);SELECT ID, bits FROM SQL_Cruise.dbo.bitsORDER BY ID;UPDATE SQL_Cruise.dbo.bitsSET bits = COALESCE(bits, '12345', 67890);SELECT ID, bits FROM SQL_Cruise.dbo.bitsORDER BY ID;
In this case we see implicit conversion takes hold along with truncation. The replacement value of ‘12345’ converted to 1. Bit is still of a higher precedence than varchar and since the first character can be implicitly converted to a bit value that is what happens.
Scenario Three: integer value --> varchar() column
What happens when the replacement value is of a higher precedence but can be implicitly converted? The results are quite surprising:
CREATE TABLE SQL_Cruise.dbo.strings(id INT IDENTITY(1,1) NOT NULL,strings VARCHAR(3) NULL);INSERT INTO SQL_Cruise.dbo.strings(strings)VALUES('ABC'), ('DEF'), (NULL);SELECT ID, strings FROM SQL_Cruise.dbo.stringsORDER BY ID;UPDATE SQL_Cruise.dbo.stringsSET strings = COALESCE(strings, 12345, '67890');SELECT ID, strings FROM SQL_Cruise.dbo.stringsORDER BY ID;
Msg 245, Level 16, State 1, Line 15Conversion failed when converting the varchar value 'ABC' to data type int.
Even though 12345 could be converted to a string value we receive the error above. Is it because ‘12345’ is longer in length than the varchar(3) column for which we’re attempting to replace the NULL value? We can attempt this again by changing the length of the column from varchar(3) to varchar(5) and re-executing:
ALTER TABLE SQL_Cruise.dbo.strings ALTER COLUMN strings VARCHAR(5);SELECT ID, strings FROM SQL_Cruise.dbo.stringsORDER BY ID;UPDATE SQL_Cruise.dbo.stringsSET strings = COALESCE(strings, 12345, '67890');SELECT ID, strings FROM SQL_Cruise.dbo.stringsORDER BY ID;
Msg 245, Level 16, State 1, Line 31Conversion failed when converting the varchar value 'ABC' to data type int.
No, we see the same error. This is because INT is of higher precedence than VARCHAR() and the rule for the CASE statement is the result is the higher precedence data type takes control of the statement. If you look at the error generated it’s a conversion error when the query runs against the first row in the table. Since we didn’t include a WHERE clause in the query the statement runs against all rows – even those where the string column is not NULL. Since INT is of a higher precedence than VARCHAR it’s the lead for the comparative process that occurs in the statement. However, if I was to add a WHERE clause to limit the query to just rows where the column is NULL we have success:
UPDATE SQL_Cruise.dbo.stringsSET strings = COALESCE(strings, 12345, '67890')WHERE strings IS NULL;SELECT ID, strings FROM SQL_Cruise.dbo.stringsORDER BY ID;
Conclusion
At the start of this article I said there was a flaw with ISNULL. That’s not entirely true and matter of fact the behavior does serve a purpose. I find ISNULL the function of choice when applying the effects of the function against the column used in the first parameter. In simple terms: when you want to replace NULLs in a column and are using that column for the basis of the check for NULL then use ISNULL since it will ensure that any replacement values conform to the data type of the column being targeted.
For example: if we changed the SET clause for Austen’s UPDATE statement to be the following we would ensure that control is exerted over the replacement value to avoid a possible truncation or data type mis-match if we were working with data types that don’t subscribe to implicit conversion:
SET middle_name = ISNULL(SQL_Cruise.dbo.People.middle_name,'Robert')
Doing so ensures the value being used as the replacement “plays friendly” with the schema.
However, whichever option you choose: ISNULL or COALESCE it’s important to keep in mind the following rules:
ISNULL will use the data type for the first parameter as the data type for the output of the ISNULL function.
COALESCE is bound by the same rules of the CASE statement it mimics when it comes to data type output as well as processing. The higher data type in terms of data type precedence will control the processing and if the data types do not match and explicit or implicit conversion is not possible the query will fail.
About the Author
You May Also Like