The ABCs of Bcp
Count on SQL Server's bulk copy program (bcp) for importing and exporting large datasets.
September 18, 2001
A step-by-step primer for bcp novices
Bulk copy program (bcp), the command-line bulk copy utility that ships with all SQL Server releases, is an oft-overlooked but essential component of a DBA's toolkit. Although the utility gets stuck in the SQL Server program folder without so much as a shortcut on the Start menu—and only a brief mention in SQL Server Books Online (BOL)—bcp has survived numerous SQL Server product releases and has even seen minor enhancements in its recent incarnations. Despite bcp's low visibility, it remains one of the best tools for quickly moving large quantities of data into and out of SQL Server databases. When you use bcp properly, it can move large sets of data faster than SQL Server graphical utilities such as Enterprise Manager and Data Transformation Services (DTS), which usually steal the spotlight.
If you're new to bcp, here's an overview of the way it works. Bcp moves data from tables in the database to files on the file system and vice versa. On the file-system side, bcp works mostly with flat files, such as tab-delimited or fixed-width text files, but bcp also works with a special SQL Server file type that allows the transfer of noncharacter-based data. Bcp doesn't create database objects, so a table must exist before you can use bcp to transfer data into it. Typically, when you insert records into a table, the server must first record in a transaction log any changes that you write to database tables, then write the records to the database itself. For large datasets, this operation consumes significant disk time and space because the server must write every modified record twice. However, bcp can import data into a table faster than a logged insert because, under certain circumstances, you don't have to record the modified records in the transaction log. Inserting records into a table without recording changes in a transaction log is called a fast bulk copy, and I describe it in greater detail later. But first, let's delve into bcp's relationship with the command line.
Bcp Basics
Executing utilities from the Windows NT command line is an unfamiliar activity for many DBAs. On Windows 2000 and NT workstations, you can get an NT command line by using the Command Prompt shortcut on the Start menu. DBAs who are accustomed to graphical utilities might find using the command line a little daunting at first, but the rewards are worth the effort. Typing BCP -h at the command prompt presents a cheat sheet for the command line's general syntax, which Figure 1 shows. The cheat sheet includes the program's two-dozen-plus command-line options, but you only need to know a few options to understand the tips I present. Keep in mind that the command-line options are case-sensitive, so, for example, -n doesn't mean the same thing as -N. Generally, the command line's syntax resembles the following example:
bcp {dbtable} {in | out | queryout | format} datafile options
The line begins with bcp, then continues with the fully qualified table name dbtable. A fully qualified table name contains the database name, the table-object owner, and the table-object name. For example, Northwind.dbo.Employees is a fully qualified table name. You then enter the word in or out. If you use bcp to copy data from the file to the database table, you use the in option. If you use bcp to copy data from the database table to the file, you use the out option. The syntax continues with the filename datafile. The filename can be absolute, such as C:TempNorthwind.dat, or it can be relative to the current folder, such as ..Northwind.dat. (The two periods indicate that the file Northwind.dat is written one folder higher in the folder hierarchy.) If you use bcp out, you create a new file with the specified name while automatically overwriting any file of the same name that already exists. You can then specify any number of bcp command-line options.
What kind of command-line options can follow the filename? For starters, you can use the -S option to specify the server name. (You don't need the server name when you connect to a SQL Server installed on a local machine.) If the requested SQL Server instance doesn't respond to the client machine's default protocol, you need to set up the server name as a server alias in the SQL Server Client Network Utility. Next, you can connect to the server by using the -U and -P options to specify a username and password. However, you only need to use these options when you connect under standard security because a trusted connection doesn't require a username or password. If you want to use a trusted connection to the database, you need to choose the -T option to make the connection instead of the -U and -P options. Then, to select the data file type, use the -n option if the data file you want to copy is a SQL Server native type or -c if the file is a tab-delimited text type.
By applying these options, let's write a command that uses the following information to bcp data from a database table to a file:
The source table is Employees, which dbo owns.
The table is located in the Northwind database.
The destination is a tab-delimited text file called Northwind.txt.
The file is in the current folder on a server called HOMER, which you access through the sa account (no password is required).
After you piece together all the options, your bcp command looks like the following:
bcp Northwind.dbo.Employees out Northwind.txt -c -S HOMER -U sa -P
Now, to reverse the data flow's direction, let's bcp data from a file to a database table by writing a command that uses the following information:
The source data file is a native SQL Server file type called Invoices.dat.
The file resides on a server called BART, which you access through a trusted connection.
The destination table is Invoices, which Lisa owns.
The table resides in the Northwind database.
Here's the command:
bcp Northwind.lisa.Invoices in Invoices.dat -n -S BART -T
The preceding examples demonstrate some of bcp's basic uses. You can move large tables—indeed whole databases—by using these command-line options. Bcp can also carry out much more elaborate transfers by using fixed-width text files or various delimiters. For more information about bcp command-line options, see BOL.
A Fast Bulk Copy
If the destination database meets two specific conditions, bcp can conduct a fast and efficient bulk copy. First, the database must have the bulk copy database setting enabled. Open Enterprise Manager, drill down to the destination database, right-click the database, and choose Properties from the menu. After the Properties dialog box appears, you can inspect the Options tab to determine whether the Select into/bulk copy option is activated, as Figure 2 shows. Second, the destination tables must be free of indexes. You can also check this condition in Enterprise Manager: Drill down to the destination table, right-click the table, and from the menu, choose All Tasks, Manage Indexes. You can confirm whether a table has indexes by inspecting the Manage Indexes dialog box, which Figure 3 shows. If the destination table is new, you can delay creating indexes until bcp has copied all the data into the database.
When you copy data into a new database, you can usually meet these two conditions without a problem. However, for production databases that applications are currently using, you might encounter a challenge. Because bcp doesn't write its records to the transaction log, you can't reverse any data damage that bcp inflicts through failure or interruption. Therefore, you need to be careful about using bcp to populate tables that applications are accessing, and you might have to consider other options for importing data. You also need to think carefully before changing a production database's database setting. Activating the Select into/bulk copy database option prevents you from restoring any transaction log backups that SQL Server made since the last full database backup. Although you can use bcp to transfer a small group of a database's tables to another database, if you want to make the transfer as fast as possible, the destination database or tables still need to meet the two conditions for fast bulk copy.
As I mentioned earlier, bcp doesn't create database objects. Therefore, to create the tables in the destination database, you have to generate SQL scripts for the source database's table objects. You can easily generate SQL scripts in Enterprise Manager: Drill down to the source database, right-click on the database, and choose Generate SQL Scripts from the All Tasks menu. Then, select the desired database objects from the General tab and check the Formatting tab to confirm that, for each object, Enterprise Manager scripts only the CREATE command. If you also choose to script the DROP command, Enterprise Manager writes a command to drop each object before the command to create it. And although in most cases this script sequence works well, dropping an object manually is safer. Figure 4 shows the Formatting tab in the Generate SQL Scripts window, in which you select the CREATE command scripting option. Next, in the same window's Options tab, verify that Script Indexes, Script Triggers, and Script PRIMARY Keys, FOREIGN Keys, Defaults, and Check Constraints options are activated, as Figure 5 shows. Then, save the SQL scripts to a file by clicking OK. If you want to preview the scripts before you save them, click Preview on the General tab. Finally, to create the required tables and other objects on the destination database, you can execute the SQL scripts that you generated from the source database. I have one caveat: Before proceeding, you need to open Query Analyzer, then execute the CREATE TABLE commands from the SQL scripts that you generated from the source database. Figure 6 shows a CREATE TABLE command in the Query Analyzer window. Group the commands together near the top of the script. At this point in your data transfer, you want to execute CREATE TABLE commands alone because if any of the executed commands create indexes, you won't be able to complete a fast bulk copy into the indexed table objects.
After you create the table objects on the destination database, you can safely use bcp out to bulk copy out the tables one at a time from the source database to files. If you copy data to and from SQL Server, you can—and probably should—use a native-type copy because SQL Server can read and write the native-type format more efficiently. Then, all the files you just created can be bulk-copied back into the destination database, one at a time. After you complete the bulk copying, you can execute on the destination database the remaining commands from the SQL scripts that you generated earlier. Executing the remaining commands after bulk copying is completed creates all the key constraints, triggers, and indexes that belong on the destination tables but weren't created earlier.
Why Not Use the DTS Import/Export Wizard?
Although transferring data by using bcp isn't complicated, you might wonder whether an easier method exists. In SQL Server 2000 and 7.0, DTS is one alternative for transferring data. DTS has many positive features. It's graphical and easy to use. A DTS shortcut in the Start menu opens the Import/Export Wizard. And the utility features handy wizards, integration with Enterprise Manager, and advanced capabilities for performing complex data transformations that aren't available in bcp. You can also use DTS to transfer data to and from heterogeneous data sources.
If DTS is so feature-laden, why should you ever consider using bcp to copy data? The best reason to use bcp rather than the DTS Import/Export Wizard is speed of data transfer. The Import/Export Wizard is great for transferring a small amount of data or for performing data transformation. Because the wizard often takes less time to set up than a bcp command, the wizard has a special advantage with smaller datasets. However, the wizard's time value diminishes as datasets become larger and bcp begins to outperform the wizard. Also, the wizard can't handle nonlogged data transfers, so the rate of transfer decreases in direct proportion to the amount of time SQL Server takes to write to the transaction log.
However, an equally important reason not to use DTS is that DTS doesn't always create a destination table that exactly duplicates the source table; so by default, you might lose constraints, indexes, and identity columns in the transformation. This problem isn't crucial if you transfer only small amounts of data on a day-to-day basis. However, when you move or copy a database, you have to ensure that all database objects transfer as they were originally defined. By using bcp and the SQL scripts that Enterprise Manager generates, you create a destination table just like the source table, along with its constraints, indexes, and identity columns. Because the data that moves between the two databases is identical, the two tables are functionally equivalent.
How About Database Restore?
SQL Server 7.0 introduced another alternative to using bcp to move data: the database restore feature. To transfer data by this method, you simply create a database backup file, then restore the database backup file into a new database. The database restore feature is completely integrated into Enterprise Manager and incorporates wizards that make backing up and restoring a database simple for even the newest DBA. The destination database is, in fact, an exact copy of the source database, and the transfer is fast.
So why not use database restore to copy data? The best reason not to use the database restore feature to copy data is loss of control. When you restore a database from a backup file, the new database is exactly the same as the original. In other words, all the mistakes that you made in the source database end up in the new database. If the transaction log is hundreds of megabytes too large in the source database, it will still be several hundred megabytes too large in the destination database. All the indexes that you haven't rebuilt lately won't be rebuilt in the destination database. Also, using the database restore feature doesn't give you the option of copying only selected objects; a database restore copies all or nothing. Using bcp and the SQL scripts that Enterprise Manager generates takes only the space that the objects themselves require and just enough space on the transaction logs for building the constraints and indexes. Executing the SQL scripts on the destination database also delivers a bonus: The procedure rebuilds all the indexes. By using bcp, you can pick and choose the objects that belong in the destination database.
Giving Bcp Its Due
SQL Server ships with several tools for copying data from one database to another. Since Microsoft introduced new graphical data-transfer tools in SQL Server 7.0, bcp has probably become the least often discussed utility and the hardest to learn. However, for copying large sets of data as quickly as possible, bcp is still one of the best tools for the job. Any DBA with an occasional need for moving large amounts of data should become acquainted with the basics of using bcp.
About the Author
You May Also Like