SQL Server import faster than bulk import

bulkinsert sqlbulkcopy sql-server

Question

I have a large (~50Gb, ~300 mln rows) tab separated file, which I want to import into a SQL Server table with columns: char(10), varchar(512), nvarchar(512), nvarchar(512).

It takes about a day to bulk import it using T-SQL, SSIS or C# SqlBulkCopy class.

Is there any faster way to load this data?

Or could there be some condition slowing it down, which I can remove or change?

Accepted Answer

If you are inserting to an existing table, drop all indexes prior to import and re-create them after the import.

If you are using SSIS, you can tweak the batch and commit sizes.

Verify there is adequate memory on the server for such a large data load.

Perform the loading operation on the local server (copy file locally, don't load over the network).

Configure your destination database and transaction log auto-growth options to a reasonable value, such as a few hundred MB chunks at a time (default is typically growth by 1MB for the master data file .mdf). Growth operations are slow/expensive so you want to minimize these.

Make sure your data and log files are on fast disks, preferably on separate LUNs. Ideally you want your log file on a mirrored separate LUN from your log file (you may need to talk to your storage admin or hosting provider for options).


Popular Answer

I have just spent the last few weeks fighting with the optimizing a very large load myself. BULK INSERT is the fastest way, I found with BCP, as opposed to SSIS or TSQL Bulk Insert, but there are things you can do to tune this.

  • Try raising/lowering the rows per batch setting, to move resource pressure between CPU and memory (higher will reduce CPU, lower will decrease memory).
  • If there is a Clustered Index, or non-clustered indexes drop them and re-create them after the insert
  • Split the TSV into smaller files using a .NET tool and then parallel load them into the table. This requires the table is a Heap (Clustered index dropped)
  • Make sure it is minimally logged. For a Heap, this requires TABLOCK, for a Clustered Index it requires Trace flag 610 and requires the data robe ordered the same as the clustered index key. Either will require either SIMPLE or BULK LOGGED recovery models



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why