Inserting 200 million entries into a database using C# optimization

c# datatable sqlbulkcopy

Question

I have the following (simplified) code which I'd like to optimise for speed:

long inputLen = 50000000; // 50 million 
DataTable dataTable = new DataTable();
DataRow dataRow;
object[] objectRow;
while (inputLen--)
{
    objectRow[0] = ...
    objectRow[1] = ...
    objectRow[2] = ...

    // Generate output for this input
    output = ...

    for (int i = 0; i < outputLen; i++) // outputLen can range from 1 to 20,000
    {
         objectRow[3] = output[i];
         dataRow = dataTable.NewRow();
         dataRow.ItemArray = objectRow;
         dataTable.Rows.Add(dataRow);
    }
}

// Bulk copy
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "newTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();

I'm already using SQLBulkCopy in an attempt to speed things up, but it appears assigning values to the DataTable itself proves to be slow.

I don't know how DataTables work so I'm wondering if I'm creating unnecessary overhead by first creating a reusable array, then assigning it to a DataRow, then adding the DataRow to the DataTable? Or is using DataTable not optimal in the first place? The input comes from a database.

I don't care much about LOC, just speed. Can anyone give some advice on this?

1
5
2/1/2011 8:09:06 AM

Accepted Answer

For such a big table, you should instead use the

public void WriteToServer(IDataReader reader)

method.

It may mean you'll have to implement yourself a "fake" IDataReader interface with your code (if you' don't get the data from an existing IDataReader), but this way, you'll get "streaming" from end to end, and will avoid a 200 million loop.

13
10/4/2011 9:09:06 PM

Popular Answer

Instead of holding a huge data table in memory, I would suggest implementing a IDataReader which serves up the data as the bulk copy goes. This will reduce the need to keep everything in memory upfront, and should thus serve to improve performance.



Related Questions





Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow