How to create custom class w/ IDataReader from LumenWorks (Fast CSV Reader) to use SqlBulkCopy while having several different file versions?

c# csv idatareader sqlbulkcopy

Question

I have decided to replace/convert DataTable to IDataReader due to memory issues.

After quite a while of Google & MSDN searches, I came across this at http://www.michaelbowersox.com/2011/12/22/using-a-custom-idatareader-to-stream-data-into-a-database/ and Bulk Insert Sql Server millions of record .

Since I'm using LumenWorks Fast CSV Reader and I haven't yet figure out how tell the CsvReader to have IDataReader to use 2 different field versions. :-( The csvReader.FieldCount is the key here but I don't see how to tell CsvReader to use either of the 2 new classes having IDataReader interface. See the original script and modified script below... Thanks...

//Original scripts...

var dbConnection = new SqlConnection(_dbConnectionString);

using (var dbBulkCopy = new SqlBulkCopy(dbConnection)
{
   using (CsvReader csvReader = new CsvReader(new StreamReader(filePath), false, '|', '"', '\\', '#', ValueTrimmingOptions.UnquoteOnly))
   {
       while(csvReader.ReadNextRecord())
       {
           if (csvReader.FieldCount == 48)
           {
               //Version 1...
               dataRow["DealerId"] = csvReader[0];
               dataRow["DealerName"] = csvReader[1];
               //Etc...
           }
           else if (csvReader.FieldCount == 51)
           {
               //Version 2...
               dataRow["DealerId"] = csvReader[0];
               dataRow["DealerName"] = csvReader[1];
               //Etc...
           }
           else { throw new Exception("Field Total Count Mismatched"); }

           dataTable.Rows.Add(dataRow);
       }

       dbConnection.Open();

       dbBulkCopy.WriteToServer(dataTable);
   }
}

//New script...

 var dbConnection = new SqlConnection(_dbConnectionString);

using (var dbBulkCopy = new SqlBulkCopy(dbConnection)
{
   dbConnection.Open();

   using (CsvReader csvReader = new CsvReader(new StreamReader(filePath), false, '|', '"', '\\', '#', ValueTrimmingOptions.UnquoteOnly))
   {
       csvReader.ReadNextRecord();

       dbBulkCopy.WriteToServer(
           if (csvReader.FieldCount == 48)
           {
               //Version 1...

               csvReader....???  //Assign a custom class having IDataTable...
           }
           else if (csvReader.FieldCount == 51)
           {
               //Version 2...
               csvReader....???  //Assign a custom class having IDataTable...
           }
           else { throw new Exception("Field Total Count Mismatched"); }
        );
   }
}

//Sample Script...

using (var file = new StreamReader(path))
using (var csv = new CsvReader(file, true)) // true = has header row
using (var bcp = new SqlBulkCopy(connection)) {
    bcp.DestinationTableName = "TableName";
    bcp.WriteToServer(csv);
}

Popular Answer

Since, it'll be a bit long, I am writing it as an answer.

*I am assuming that despite the fact that you have two kinds of csv file coming with different field orders, the target table is the same.* [EDIT] No need to assume , you indicated in your comment.

To be able to to understand your context, I got some sample data from here.

Let's say that first type of file is something like the following :

Rk,Year,Age,Tm,Lg,Id,G,W,L,W-L,Finish,
1,1978,37,Atlanta Braves,NL,,162,69,93,.426,6

And the second type like (some columns are reversed Age <-> Finish and there are additional fields)

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3
1,1978,Very good year,Atlanta Braves,NL,,162,69,93,.426,96,,,,

So the target table would look something like (only columns)

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3

I see two options (+1 option at the end) here :

OPTION 1

  1. Add Step 0 to make all the input files uniform on field level by defining a field format. This can be done by creating the same fields that you have on the database.

[Let's imagine that Test4 and Test5 are columns that exists on target table but missing in both CSV files]

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3,Test4,Test5
  1. Parse all the files you have(both types) rewrite them onto one(or multiple that's up to you) respecting the format you defined. This way, you have only 1 (or multiple) file with a unique format.

  2. You can parse now this file to insert it into database using csv reader since field incompatibility problem is handled with the last file you obtained in unique format.

OPTION 2

You will do SqlBulkCopy operation twice. First round you'll read files having 48 fields and next round you'll read files having 51 fields.

            var FilesWith48Fields = Directory.GetFiles(@"D:\Data\48Fields", "*.csv");

            foreach (var fileName in FilesWith48Fields)
            {
                using (var file = new StreamReader(fileName))
                using (var csv = new CsvReader(file, true)) // true = has header row
                using (var bcp = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepNulls))
                {
                    bcp.DestinationTableName = "fletchsodTable";
                    // map the first field of your CSV to the column in Database
                    var mapping1 = new SqlBulkCopyColumnMapping(0, "FirstColumnName");
                    bcp.ColumnMappings.Add(mapping1);

                    var mapping2 = new SqlBulkCopyColumnMapping(1, "SecondColumnName");
                    bcp.ColumnMappings.Add(mapping2);  
                    ....

                    bcp.WriteToServer(csv);
                }
            }

And Repeat the same with files having 51 fields

var FilesWith51Fields = Directory.GetFiles(@"D:\Data\51Fields", "*.csv");
......

More information on SqlBulkCopyColumnMapping can be found here.

OPTION 3

In case you want to adventure in creating your Data Reader, here are some links :

Daniel Wertheim's blog

A sample implentation on codeproject

Another one

And finally MSDN

PERSONAL NOTE Due lack of time, for a similar problem of mine I gave up 3rd option because with all the unit tests and optimization that you'll have to do and another tweaks, this can take time(at least for me, it was the case)

OPTION 4 Perhaps, with the column mapping I pointed in OPTION 2 , you would want to implement your way by testing field counts. But instinctively, I would suggest not to count fields by hard coded integers.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why