Based on the idea presented here, I've created a
QueuedDataReader which wraps a
ConcurrentQueue in an
IDataReader so I can pass it to
SqlBulkCopy and 'stream' data into the database.
Two possibly important details are:
IDENTITYcolumns (which SqlBulkCopy is not inserting into), so I don't think the 'SORT' provisos are relevant.
In addition to that:
Whilst writing my question, I discovered another excellent resource here, which pointed to BatchSize as a potential culprit.
The results are quite surprising. If we use the BatchSize parameter, performance get worse and worse as we set it to lower values.
Using a very small value for the batch size the network bandwidth stays well under 20% of usage for the whole process (please note that the 500 batch size graph upper limit is 25%, which is different from the other graphs). Any value below 10.000 decreases performance in a very heavy way, leading to very poor time and a heavy log file usage.
When we reach 10.000 of Batch Size, then the difference in time among the various tests becomes very small. But, since we have 8 threads writing 750.000 rows each, then we only have 75 chunks of data sent from each thread. Needless to say, the best performance is obtained when we use 0 as BatchSize, sending the whole 750.000 rows in a single batch.
It might be interesting to discover if there is any benefit, apart from performance, by using the BatchSize parameter. We did not find anything mentioned in the Books On Line nor have we ever seen anything interesting during our experience, this lead us to say that the best thing to do with BatchSize is to leave it to zero, which is its default value, since any value different from that will decrease the performance of the load process.
I had previously abandoned
BatchSize = 0 because of timeout errors, but by setting my
BulkCopyTimeout to zero, I get some log file growth, but far less than previously.
UPDATE. After messing around for too long doing manual tests, I finally knuckled down to writing an automated test suite, trying variations of BatchSize(0-5000), enqueuing rates, and clustered/nonclustered indexes. (In my case, I'm writing data to a dozen tables in parallel).
This is what I've found:
CLUSTEREDor BatchSize is zero, I get no log file growth.
CLUSTEREDstill prevents log growth, whatever the BatchSize.
Note, my tests write data at a given rate into a queue wrapped in an IDataReader, which is dequeued by BulkInsert. One queue/BulkInsert for each table. Next step is to try with bigger data sets.