Corrupted Characters
When automating the execution of T-SQL script files, Cindy notices that the extended characters are corrupted upon import to the data warehouse. Readers help Cindy identify the cause of the problem and provide her with an efficient solution.
December 18, 2002
Congratulations to Kristofer Andersson, senior software developer for RR Technologies in Fort Lauderdale, Florida, and Arnie Rowland, SQL Server trainer and consultant for WestWood Consulting, Inc. in Portland, Oregon. Kristofer won first prize of $100 for the best solution to the January Reader Challenge, "Corrupted Characters." Arnie won second prize of $50. Here’s a recap of the problem and the solution to the January Reader Challenge.
Problem:
Cindy is the systems architect for a company that specializes in data warehousing technology. She needs to automate the execution of several T-SQL script files that are in Unicode format. These scripts populate tables in the warehouse. For example, Cindy needs to populate a Country dimension table with data based on the ISO 3166 codes and localization information for reporting purposes. The following shows a sample script file:
CREATE TABLE #t ( c varchar(30) NOT NULL )INSERT INTO #t VALUES( 'Tëst' )SELECT c FROM #tDROP TABLE #t
Cindy is automating the execution of the scripts against several SQL Server 2000 databases by using the OSQL command-line utility. But when she uses OSQL she notices that the extended characters in the scripts are corrupted upon import. Extended characters, such as the German umlaut (ä), aren’t in the standard ASCII character set. Help Cindy identify the cause of the problem and provide her with an efficient solution for executing the scripts successfully, without any data corruption or loss.
Solution:
Cindy’s problem is that the ODBC driver she is using to connect to the data warehouse is performing translations on the character data in the T-SQL script. Extended characters, which aren’t in the standard ASCII character set, are translated by the driver based on the ANSI to OEM conversion code page. The character translation option is on by default when SQL Server executes scripts through the OSQL utility. By using Unicode script files and converting the column to Unicode, she can avoid the character translation. The script can be modified as follows and saved as a Unicode file:
CREATE TABLE #t ( c nvarchar(30) NOT NULL )INSERT INTO #t VALUES( N'Tëst' )SELECT c FROM #tDROP TABLE #t
Cindy can also disable the character-translation option by using OSQL’s -D parameter to specify an ODBC data source that contains the required settings to avoid the character translation. The ODBC data source with the default settings excluding the character translation option is specified as a parameter.
Cindy needs to do the following with the ODBC DSN to execute the scripts successfully, without any translation:
1. Create an ODBC system data source called ImportDSN on the machine where she is executing OSQL with the "Perform translation for character data" option cleared
2. Specify this data source name as a parameter to OSQL so that OSQL can read the DSN settings and use them upon connection to SQL Server.
To execute the sample script above, save it as a Unicode file, then call OSQL:
osql -S. -itest.sql –DImportDSN
FEBRUARY READER CHALLENGE:
Now, test your SQL Server savvy in the February Reader Challenge, "Reporting Sales" (below). Submit your solution in an email message to [email protected] by January 16. SQL Server MVP Umachandar Jayachandran, a SQL Server Magazine technical editor, will evaluate the responses. We’ll announce the winner in an upcoming SQL Server Magazine UPDATE. The first-place winner will receive $100, and the second-place winner will receive $50.
Karen is a SQL Server developer at a Web-hosting company that uses SQL Server 2000 and 7.0 to host several sales databases. The Purchases table in each sales database captures items that a company sells. The Purchases table schema, with the relevant columns, is
CREATE TABLE Purchases ( SaleID int NOT NULL IDENTITY PRIMARY KEY, SaleAmount money NOT NULL, SaleTime smalldatetime NOT NULL ) CREATE INDEX idx_saletime ON Purchases( SaleTime )
-- Sample data: INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (6.24, 'Jan 1, 2002 8:22 am') INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (22.29, 'Jan 1, 2002 10:42 pm') INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (128.40, 'Jan 2, 2002 11:12 am') INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (0.45, 'Jan 3, 2002 6:28 pm') INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (16.32, 'Jan 4, 2002 11:41 am') INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (9.11, 'Jan 6, 2002 6:30 pm') INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (69.96, 'Jan 6, 2002 6:55 pm') INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (6.99, 'Jan 6, 2002 8:23 pm') INSERT INTO Purchases (SaleAmount, SaleTime) VALUES (18.43, 'Jan 7, 2002 11:42 am')
Karen needs to generate a report from each database’s Purchases table that shows the following for a given month of the year:
1) Day of the month, in the format MM/DD, that an item was purchased
2) Number of items sold each day
3) Total amount sold each day
In addition, the report must include all the days between the first and last sales date of the month, and the output must be ordered by the day of the month. Help Karen produce this report efficiently.
About the Author
You May Also Like