Getting rows inserted with SqlBulkCopy

linq-to-sql sqlbulkcopy sql-server transactions

Question

I am switching some of my Linq to Sql code to use SqlBulkCopy, and problem is I need to do two inserts of multiple thousands of rows into two tables.

The service takes your batch of 10,000 links (imported from sitemap, backlink builders, etc), and chops them into RSS feeds of X per feed for aggregation. Problem is, I already have a table of 32 million rows. If i am doing linq to sql inserts, it takes depending on site traffic anywhere between 5 and 10 mintues to load 10,000 links.

The structure is very basic.

Feeds: Id bigint (PK), Title varchar(1000), Description varchar(1000), Published datetime, Aggregated datetime null, ShortCode varchar(8) [antiquated, not inserted anymore, but used for legacy data]

Items: Id bigint (PK), FeedId bigint (FK), Title varchar(1000), Description varchar(1000), Published datetime, ShortCode varchar(8) [antiquated, not inserted anymore, but used for legacy data], ShortId bigint null [updated after insert to equal Id (used in partitioning)]

FutureItems: Id bigint (PK), FeedId bigint (FK), Title varchar(1000), Description varchar(1000), Published datetime, ShortCode varchar(8) [antiquated, not inserted anymore, but used for legacy data], ShortId bigint null [updated after insert to equal Id (used in partitioning)]

OldItems: Id bigint (PK), FeedId bigint (FK), Title varchar(1000), Description varchar(1000), Published datetime, ShortCode varchar(8) [antiquated, not inserted anymore, but used for legacy data], ShortId bigint null [updated after insert to equal Id (used in partitioning)]

So if you have a feed size of 20, you get 500 inserts into the Feeds table, then 10000 inserted into the Items table, then and update runs to set the ShortId equal to the Id. Once a night, a job runs that separates the data into the other two tables, and shift future items into the Items table.

I read that SqlBulkCopy can do 20 million rows in the matter of mintues, but I can't find any good examples of doing it into multiple tables with a FK Constraint.

Our SQL server is a "monster" especially for this application. It is SQL 2008 R2 Web, Windows 2008 R2 Enterprise, 12GB Ram, Dual 4 core Xeons @ 2.8ghz.

Our web server is a clone without the database service.

The CPU runs about 85% when inserting links, and the database fills the RAM.

If SqlBulkCopy isn't good, any suggestion is welcome, we have paying customers who are getting mad, and I am not a DBA, just a plain-old-programmer.

Accepted Answer

SqlBulkCopy is indeed faster than ordinary inserts. But is faster as in it can transform a job that runs 1000 inserts per second into one that does 10000/sec. If you can only do 10000 links in 10 minutes, you must be having different problems, something that bulk copy is unlikely to solve.

You need to first investigate why it takes so incredibly long to insert 10000 links. Only after you understand that can you make a call that determines if moving to SqlBulkCopy is a solution. I understand that you are not a DBA, but I'm going to direct you a 'dbaish' white paper for troubleshooting SQL Server performance: Waits and Queues. This is a not a cookie cutter recipe solution, is actually a methodology that will teach you how to identify performance bottlenecks in SQL Server.

And to address your question: how does one use SqlBulkCopy when there are constraints? The more generic question is how does one do bulk insert operations when constraints are in place? For serious volumes one actually disables the constraints, performs the bulk uploads, then enables back the constraints. For more streamlined online operations with minimal downtime (the database is basically 'down' for the period when constraints are disabled) one use a different strategy, namely it pre-loads the data in staging tables, validates it and then switches it in with a partition switch operation, see Transferring Data Efficiently by Using Partition Switching.


Popular Answer

I think your real problem in just using a plain bulk insert is that you need the feed ids from the initial insert for the other tables. Here's what I would do. Use bulk insert to insert to a staging table. Then use a stored proc to do the inserts to the real table in a set-based fashion. You can use the output clause in the intial insert to the feed table to get back a table variable with the feed ids you need for the inserts to the other tables.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why