indexoutofrangeexception in Datatable with Parallel.Foreach

c# datatable parallel.foreach parallel-processing sqlbulkcopy

Question

I am trying to augment a DataTable that has IP addresses in one column with their reverse dns mapping. I am getting this DataTable from somewhere else. I am then exporting this table to SQL Server using SQLBulkcopy

I am adding two columns, one for the dns name and one for the top level domain part only. SInce I have a lot of IPs and doing reverse DNS takes some time I am using a Parallel for each. Strangely I get odd unpredictable IndexOutOfRangeExceptions within NestedException for the parallel loop (or sometimes outside of the parallel loop when I call clear in the datatable). Why?

Here is what I am doing

        //mapping specifies a mapping within the DataTable and the Database

        SqlBulkCopy copy = new SqlBulkCopy(cn);
        foreach (KeyValuePair<int, int> pair in mapping)
        {                                                                       
            copy.ColumnMappings.Add(pair.Key, pair.Value);
        }


        dt.Columns.Add("URL");
        dt.Columns.Add("Domain");
        solver.AddDNSInfo(dt, 1); //second row has the IP
        copy.WriteToServer(dt);
        dt.Clear();     //exceptions are thrown here   




    //ipIndex is the index within the datatable where the IP of interest is. 
    //In my scenario ipIndex=1
    public void AddDNSInfo(DataTable table, int ipIndex)
    {
        Parallel.ForEach(table.AsEnumerable(), row =>
        {
                string URL = GetDNSName((string)row[ipIndex]);
                row["URL"] = URL; //exceptions are thrown here
                row["Domain"] = GetTopLevelDomain(URL);
                Console.Write("*");
        });

Accepted Answer

Because DataTable is not thread safe for multithreaded write operation.

MSDN says:

This type is safe for multithreaded read operations. You must synchronize any write operations.

You are writing new columns with multiple threads at the same time when, in your Parallel.ForEach, you do :

  row["URL"] = URL;
  row["Domain"] = GetTopLevelDomain(URL);

You need to synchronize any writing call to your DataTable (using a lock, or some form of monitor)



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow