Making the Most of BCP-Seven Tips for Speeding Large Data Loads with Bulk Copy Program

Handle high-speed data loads efficiently in SQL Server.

Brian Moran

January 31, 1997

8 Min Read
ITPro Today logo in a gray background | ITPro Today

Speeding Large Data Loads with Bulk Copy Program

The Bulk Copy Program (BCP) isn't sexy or exciting, and it isn't the subjectof many articles. Still, it's the most practical way to handle high-speed dataloads in SQL Server. This month I'll explain what BCP is and give some tips tohelp squeeze every drop of performance from it.

BCP is Microsoft's version of an export/import utility. (For acustom file export utility that combines Visual Basic 4.0's TreeView controlwith SQL Server 6.5's DMO BulkCopy object, see Mike Otey, "Exporting Datafrom SQL Server," January 1997.) This command-line program moves databetween an existing database table or view and an operating-system file storedin ASCII or machine format. Machine format is useful only for transferring databetween SQL Servers running on the CPU architectures (i.e., Intel, Alpha,PowerPC); people can't read machine format. BCP has many uses, but it'sespecially helpful for moving large data sets between SQL Server and otherdatabase systems or dealing with extracts from external data feeds or legacysystems.

Sometimes people think BCP is part of the SQL Server engine and try to runcommands from ISQL/w or a similar interface. BCP isn't a Transact-SQL command, and it's not part of theserver. BCP is a C-language client that communicates with the database usingspecial BCP extensions in the DB-Library (DB-Lib) API (a SQL Server-specificprogramming interface). Importing or exporting data with an Open DatabaseConnectivity (ODBC)-enabled application such as Microsoft Access is often easierbecause Microsoft doesn't provide a GUI interface for BCP and the command-linesyntax can be arcane and difficult to master. ODBC-enabled applications are OKfor small data sets, but you need to avoid ODBC applications when you're bulkloading a lot of data. ODBC applications load data using standard Insertstatements, which are many times slower than BCP.

Increasing BCP's Performance
BCP performance can vary significantly with different uses. The followingtips can help make BCP fly.

Tip 1: Always use fast BCP.
You load data with BCP in either fast or slow BCP mode. Asyou can guess, fast BCP is quicker, but you pay a price. Fast BCP forces you toset a database option, Select into/bulk copy with sp_dboption. Thisoption lets nonlogged operations occur; don't leave it on in productiondatabases because it can prevent you from backing up your transaction log. (SeeSQL Server Books Online--BOL--Effect of Select into/bulk copy onTransaction Logs for more information.) Fast mode also requires you to dropany indexes on the target table. BCP will revert to slow mode if you forget toremove even one index. You can ignore the BOL and third-party references tellingyou that triggers on a target table also cause BCP to use slow mode: SQL Server6.0 fixed this problem.

Transaction log activity accounts for the big difference in speed betweenslow and fast BCP. Slow mode causes almost as much logging as adding data withan Insert statement, whereas fast BCP does not log individual rows but logs newspace allocations only when extents are linked into the table's pagechain. (Extents are 16KB buffers, blocks of eight pages, used to manage data.)The difference between fast and slow

BCP is so great that data loads are usually faster if you drop the indexes,run fast BCP, then re-create the indexes, instead of using slow mode and fullylogging all the data insertions. If your table has a clustered index, sort thedata before running BCP. Then create the index using the sorted_data option,which tells SQL Server it doesn't need to re-sort the data when it creates theindex.

Tip 2: Run BCP from the server.
Running BCP on the same machine as SQL Server is a greatway to boost performance because it eliminates tons of network overhead.Consider the following scenario: Server A runs SQL Server and stores thedata files to be loaded. You invoke BCP from Workstation X and load datainto Server A. BCP does not run as part of SQL Server, so the data is copiedfirst to Workstation X, which immediately sends it to back to Server A to beloaded. You move a large data file around the network twice, once to the BCPclient machine and once to the SQL Server, but you can eliminate the traffic byrunning BCP directly from the server. In most cases, copying the remote datafile to the server with NT's ordinary file copy commands and then running BCPlocally is faster than BCP'ing across the network because NT's caching and filetransfer mechanisms are more efficient. Even if you don't run BCP from theserver, avoid running BCP against a remote data file because this process movesthe file around the network twice.

Tip 3: Use local named pipes.BR>When BCP runs on the same machine as SQL Server, usinglocal named pipes greatly speeds the process. Local pipes are an interprocesscommunication (IPC) mechanism and completely bypass the network to optimizeprocesses running on the same machine. Local pipes act like a TCP/IP loopback sodata never goes to the NT Redirector; by comparison, network named pipes senddata through the Redirector, even if both processes run locally on the samemachine. (You can easily see the difference in Redirector activity inPerformance Monitor--Perfmon--by looking at the Bytes Total/sec counterin the Redirector object.) BCP automatically uses a local named pipe if you donot provide a server name when you run the command.

Compaq has a great white paper, Configuration and Tuning of MicrosoftSQL Server 6.5 for Windows NT on Compaq Servers (http://www.compaq.com/support/techpubs/whitepapers/415a0696.html), full of in-depth tuninginformation. The paper compares BCP load times using a variety of packet sizeand IPC configurations; the comparisons show that local pipes are 300 percentfaster than network pipes.

Tip 4: Place BCP and SQL Server data on separate disks.
Running BCP through a localpipe reduces network overhead but can introduce disk I/O-related bottlenecks.Open the throttle by writing your BCP data files to a fast RAID array on adifferent physical drive from your SQL Server devices. If this approach isimpractical in your environment, run benchmarks with your data and your serversto determine which is worse: the network overhead involved with a remote BCP orthe contention on the disk while BCP and SQL Server fight for the same I/Obandwidth.

Tip 5: Install Service Pack 1 for SQL Server 6.5.
Service Pack 1 (SP1) for SQLServer 6.5 includes two nifty enhancements that improve BCP load times up to 700percent and provide throughput rates of approximately 3.5MB per second (MBps),according to Microsoft benchmarks. First, an engine optimization lets BCP writean extent at a time using n private buffers rather than a page at a timeusing a single public buffer area. You can set n between 1 and10 using

sp_configure 'backup buffer size'

This action has a big positive impact on speed, if your disk subsystem canhandle the load. Experiment to find the best setting.

Enabling the new Table lock on bulk load option in sp_tableoptionactivates the second improvement, which tells SQL Server to hold one exclusivetable lock rather than grab individual locks as each new extent is linked intothe page chain. Previously, BCP locked at the extent level, which could depletethe number of configured locks when a program is loading large data sets.Locking at the table level lets SQL Server acquire a single lock for theduration of the BCP load, so you can load data using much larger batch sizeswithout exhausting the fixed supply of locks. So, set the batch size to thenumber of rows in the table you're loading. These two enhancements speed dataloading with BCP in the fast, nonlogged mode; unfortunately, you won't see muchchange when you use the slow, logged version of BCP.

The changes open BCP's floodgates to the I/O subsystem, so the boost isnegligible unless your controller and disks can keep up with theincreased I/O load. After playing with SP1 and talking with others who have usedit, I believe the 700 percent gain Microsoft has achieved is optimistic for mostsites. A few Microsoft engineers I know have reported 200 percent to 300 percentimprovements using fast RAID controllers and lots of disks at corporate sites.In comparison, load times improved by 30 percent on my lowly P120 40MB laptop.

Tip 6: Experiment with packet size.
Before SP1, packet size had little effect as longas the packet size was at least 4KB. A Microsoft engineer familiar with the SP1BCP enhancements suggests using a packet size of 16KB to take full advantage ofBCP's new ability to write data in 16KB extents rather than 2KB pages.

My tests show that 16KB packets improve times slightly, but I haven'ttested packet sizes exhaustively. Try playing with this option in yourenvironment.

Tip 7: Use native mode.
BCP can work with data in native SQL Server format or in anASCII representation, which is more fun to look at. Native format uses slightlyless disk space and runs a little faster because data isn't converted to ASCIIformat. Don't expect the same boost as using a local named pipe, but everylittle bit can help.

Put 'Em All Together
BCP tuning is one part science, one part art, and three partsexperimentation, so I can't guarantee success. But I'd be very surprised ifthese tips don't improve performance at your site.

BCP isn't the most user-friendly interface in the world, and we all hopethat one day Microsoft will provide a more attractive, easier-to-use utility.But for now, BCP is the best game in town, so you might as well make the most ofit.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like