BulkCopy with relational data; fast inserts

insert sql sqlbulkcopy sql-server

Question

I want to put as much data into a database as quickly as I can, which is a lot of data that is continually coming in (around 10,000 a minute, and rising). I now use prepared insert statements, but I'm considering switching to the SqlBulkCopy class to import the data in bigger batches.

The issue is that I'm not inserting into a single database; rather, the data item's components are spread over many tables, and other rows that are simultaneously inserted utilize their identity fields as foreign keys. I am aware that bulk copies aren't intended to support such intricate inserts, but I am considering switching my identification columns (in this instance, bigints) for uniqueidentifier columns. Since I can know the IDs before the insert, I won't need to check for anything like SCOPE IDENTITY that is prohibiting me from utilizing bulk copy, and I'll be able to execute a few bulk copies for each table as a result.

Is this a workable option, or might there be additional problems I could run into? Or is there an other method I can fast enter data while still using bigint identification columns?

Thanks.

1
2
2/17/2011 2:37:02 PM

Accepted Answer

It seems that you want to switch the approach from "SQL assigns a [bigint identity() column] surrogate key" to "data prep procedure creates a GUID surrogate key." In other words, the key will be assigned outside to SQL rather than internally. In light of your volume, if the data-generating process assigned a surrogate key, I'd go with it without a doubt.

It therefore becomes an issue of whether you must utilize GUIDs or if your data production process can generate auto-incrementing integers. It's difficult to design a procedure like this that functions consistently and without error (which is one of the reasons you pay money for SQL Server), but the cost of smaller, easier-to-understand keys in the database could be worth it.

1
2/17/2011 2:56:48 PM

Popular Answer

Page splits and wider will certainly make matters worse; use of uniqueidentifier. watch zzz-5 zzz

If your load can/is batched, you have the following options:

  • a staging table is loaded
  • In one step, load the actual tables using a saved method.
  • use a distinct identifier for each batch in the staging table.

Peak row counts of about 50k occur sometimes (and increasing this way). In order to prevent duplicate transaction log writes, we really utilize a separate staging database.)



Related Questions





Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow