SqlBulkCopy and Azure Parallel Bulk Insertion

azure azure-sql-database database parallel-processing sqlbulkcopy


I have an azure app on the cloud with a sql azure database. I have a worker role which needs to do parsing+processing on a file (up to ~30 million rows) so i can't directly use BCP or SSIS.

I'm currently using SqlBulkCopy, however this seems too slow as I've seen load times of up to 4-5 minutes for 400k rows.

I want to run my bulk inserts in parallel; however reading through the articles on importing data in parallel/controlling lock behaviour, it says that SqlBulkCopy requires that the table does not have clustered indexes and a tablelock (BU lock) needs to be specified. However azure tables must have a clustered index...

Is it even possible to use SqlBulkCopy in parallel on the same table in SQL Azure? If not is there another API (that I can use in code) to do this?

3/1/2012 3:55:05 PM

Accepted Answer

I don't see how you can run any faster than using SqlBulkCopy. On our project we can import 250K rows in about 3 mins, so your rate seems about right.

I don't think that doing it in parallel would help, even if it was technically possible. We only run 1 import at a time otherwise SQL Azure starts timing out our requests.

In fact sometimes, running a large group-by query at the same time as the import isn't possible. SQL Azure does a lot of work to ensure quality of service, this includes timing out requests that take too long, take too many resource, etc

So doing several large bulk inserts at the same time will probably cause one to time out.

3/2/2012 10:14:03 AM

Popular Answer

It is possible to run SQLBulkCopy in parallel against SQL Azure, even if you load the same table. You need to prepare your records in batches yourself before sending them to the SQLBulkCopy API. This will absolutely help with performance, and it allows you to control retry operations for a smaller batch of records when you get throttled for reasons outside of your own doing.

Take a look at my blog post comparing load times of various approaches. There is a sample code as well. In separate tests I was able to cut the load time of a table in half.

This is the technique I am using for a couple of tools (Enzo Backup; Enzo Data Copy); It's not a simple thing to do but when done properly you can optimize load times significantly.

Related Questions

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow