I have an MSSQL 2008 table with a few million records. I need to iterate over each row, modify some of the data, and copy the updated record to a new table using a C# application that gets executed on a daily basis.
I have tried doing this using ADO.NET entities, but there are memory issues involved with this method, not to mention it is very slow. I have read up on bulk-copy libraries and SQL-only ways for copying one table to another, but none of them involve modifying records before copying them. I need to find a better way for performing this operation.
As you mention memory issues I'm guessing you're trying to load the million rows into memory, process them and then write them back to the database.
You can avoid this by 'streaming' the data instead of loading it entirely. The SqlDataReader
will handle buffering for you so on the reading side you can do a simple WHILE
loop that fetches rows one by one. The actual conversion you already have working it seems so all you need to do is take care of writing the results back into the database. IMHO the fastest way to do so is by storing a buffer of multiple results (start with 100, work up and see where the sweet spot is) in a data-table and then push that data-table into the database using the SqlBulkCopy
class.
Rinse & repeat.
PS: Sounds like a 'fun' problem. Do you have any sample data sitting somewhere to test this out ? 5 hours sounds like a LONG time for something that looks trivial at first, then again 20 million times virtually nothing still adds up. More specifically I wonder how 'large' the data is on the RTF side : are the values ca 2k on average or rather 200k? And what kind of hardware do you run this on ?
The fastest performing option would be to re-write your C# application logic into a CLR stored procedure so that all processing takes place on the server.