What are my alternatives if I have 30 million rows in SQL Server to insert and update each day?
Does SqlBulkCopy handle not adding data that already exists if I use it?
I need to be able to execute this scenario again using the same data without repeating any of it.
I currently have a stored procedure with an insert statement, which reads data from a DataTable, and an update statement.
What should I look for in order to do better?
The typical method for handling a situation like this is to have a permanent work table (or tables) that are unrestricted. These may often reside in a different work database on the same server.
You must clear the work tables before blasting the data in using BCP or bulk copy. Once the data has been loaded, you prepare the newly loaded data by doing any required cleaning and/or transformations. The last stage is to migrate the data to the actual tables, either by truncating the real tables and loading them again, or by completing the update/delete/insert operations required to implement the delta between the old data and the new.
If you have anything like a consistent stream of data coming in, another option is to put up a daemon to watch for it and then do the inserts. The daemon may monitor the directory for changes and do the appropriate tasks (as described above) when new information is received, for example if your data consists of flat files and is placed into a directory through FTP or a similar method.
If this is a production system, it is important to keep in mind that making several insert, remove, and update statements may result in blocking while the transaction is in progress. A massive transaction failing and rolling back has other drawbacks as well:
In order to ensure that you make progress, it may be preferable to do your inserts, updates, and deletions in smaller batches, depending on your specific situation. Over the course of a day, 30 million rows equal around 350 per second.
Perform a bulk insert into a holding table, followed by a single merge statement, an update, and an insert. In either case, you want to assess which action to take by comparing your source table to your holding table.