A Bulk-Copy ProcedureA Bulk-Copy Procedure
Larry is a database analyst who must periodically synchronize a reporting database with a data warehouse and other external data sources. Readers help Larry import the data file with the least amount of coding and redesigning of the schema.
August 10, 2004
Congratulations to Alejandro Mesa, a database programmer for Simplex Medical in Fort Myers, Florida, and Marcos Kirchner, a student at the University of Blumenau (FURB) in Santa Catarina, Brazil. Alenjandro won first prize of $100 for the best solution to the August Reader Challenge, "A Bulk-Copy Procedure." Marcos won second prize of $50. Here's a recap of the problem and the solution to the August Reader Challenge.
Problem:
Larry is a database analyst who maintains a reporting database for the company's data warehouse. He must periodically synchronize the reporting database with the data warehouse and other external data sources. The reporting database contains a table that stores a list of companies that bulk copy program (bcp) loads from a data file. The data file always contains the complete list of companies. The following sample schema for the Companies table is created from the Suppliers table in the Northwind database:
USE northwindGOSELECT SupplierId AS CompanyId, CompanyName INTO Companies FROM SuppliersALTER TABLE Companies ADD CONSTRAINT pk_companies_id PRIMARY KEY CLUSTERED(CompanyId)GO
Other tables in the reporting database also reference the Companies table. When Larry tries using bcp to load the data file containing the list of companies into the Companies table, he gets a primary key violation because of the unique constraint on the CompanyId column. Larry needs to load only new companies into the table from the data file-he can ignore updates to existing companies. What should Larry do to import the data file with the least amount of coding and redesigning of the schema? The bcp commands must use the same data file for import and insert only the new rows into the Companies table.
Solution:
Larry can create an additional unique index on the Companies table that has the IGNORE_DUP_KEY option enabled. The following statement creates the new index:
CREATE UNIQUE INDEX uq_idx_companies_id ON Companies(CompanyId) WITH ignore_dup_key
The IGNORE_DUP_KEY option on the unique index lets a bulk insert or an INSERT statement insert rows that have the duplicate key values. SQL Server will ignore the rows with duplicate key values and issue a warning message. By adding the index with the IGNORE_DUP_KEY option enabled, Larry can use bcp to copy the data file containing the list of companies into the Companies table without any modification. The duplicate rows in the file will be ignored because of the new index, and the primary key violation won't occur.
Larry can also drop the primary key on the Companies table, drop the foreign key constraints that reference the primary key, create the index with the IGNORE_DUP_KEY option enabled, and recreate the foreign key constraints. This approach works because SQL Server lets you define foreign key constraints on columns that are part of a unique index. The following script demonstrates the technique:
USE tempdbGOCREATE TABLE pt ( i int NOT NULL )CREATE UNIQUE CLUSTERED INDEX _ ON pt(i)CREATE TABLE ft ( i int NOT NULL REFERENCES pt(i))GOINSERT INTO pt VALUES(1)INSERT INTO ft VALUES(1)GO-- Will raise FK violationINSERT INTO ft VALUES(2)GODROP TABLE ft, ptGO
SEPTEMBER READER CHALLENGE:
Now, test your SQL Server savvy in the September Reader Challenge, "Foreign Punctuation" (below). Submit your solution in an email message to [email protected] by August 19. Umachandar Jayachandran, a SQL Server Magazine technical editor, will evaluate the responses. We'll announce the winner in an upcoming SQL Server Magazine UPDATE. The first-place winner will receive $100, and the second-place winner will receive $50.
Problem:
David is a database programmer for an advertising company that deals with international data. The database that holds data for each customer contains a table that stores names in Unicode format. The following script contains the table and sample data:
CREATE TABLE #Temp ( nc nvarchar(30) COLLATE Latin1_General_CI_AS )CREATE CLUSTERED INDEX Idx_Temp_nc on #Temp( nc )INSERT INTO #Temp VALUES( N'MRKT' + nchar(0x3000) + N'Name')INSERT INTO #Temp VALUES( N'TEST String' )INSERT INTO #Temp VALUES( N'MRKT' + nchar(0x3000) + N'Name' + nchar(0x3000) )
In the sample data, the Unicode character 0x3000 represents a punctuation symbol in different languages, such as Chinese. When David writes a query using an equality search condition, he notices that the query doesn't work as expected. For example, the following query returns two rows instead of one:
SELECT * FROM #Temp WHERE nc = N'MRKT' + nchar(0x3000) + N'Name' + nchar(0x3000)
What is incorrect in the query? And how can David modify the query to return the correct data without ignoring punctuation or white-space characters in any language?
About the Author
You May Also Like