Nightly, I need to fill a SQL Server 2005 table from an ODBC source with over 8 million records. Currently I am using an insert statement from linked server with syntax select similar to this:
Insert Into SQLStagingTable from Select * from OpenQuery(ODBCSource, 'Select * from SourceTable')
This is really inefficient and takes hours to run. I'm in the middle of coding a solution using SqlBulkInsert code similar to the code found in this question.
The code in that question is first populating a datatable in memory and then passing that datatable to the SqlBulkInserts WriteToServer method.
What should I do if the populated datatable uses more memory than is available on the machine it is running (a server with 16GB of memory in my case)?
I've thought about using the overloaded ODBCDataAdapter fill method which allows you to fill only the records from x to n (where x is the start index and n is the number of records to fill). However that could turn out to be an even slower solution than what I currently have since it would mean re-running the select statement on the source a number of times.
What should I do? Just populate the whole thing at once and let the OS manage the memory? Should I populate it in chunks? Is there another solution I haven't thought of?
Most data reader implementations will only keep a very small portion of the total results in memory.
SSIS performs well and is very tweakable. In my experience 8 million rows is not out of its league. One of my larger ETLs pulls in 24 million rows a day and does major conversions and dimensional data warehouse manipulations.