I have roughly 30M rows to Insert Update in SQL Server per day what are my options?
If I use SqlBulkCopy, does it handle not inserting data that already exists?
In my scenario I need to be able to run this over and over with the same data without duplicating data.
At the moment I have a stored procedure with an update statement and an insert statement which read data from a DataTable.
What should I be looking for to get better performance?
The usual way to do something like this is to maintain a permanent work table (or tables) that have no constraints on them. Often these might live in a separate work database on the same server.
To load the data, you empty the work tables, blast the data in via BCP/bulk copy. Once the data is loaded, you do whatever cleanup and/or transforms are necessary to prep the newly loaded data. Once that's done, as a final step, you migrate the data to the real tables by performing the update/delete/insert operations necessary to implement the delta between the old data and the new, or by simply truncating the real tables and reloading them.
Another option, if you've got something resembling a steady stream of data flowing in, might be to set up a daemon to monitor for the arrival of data and then do the inserts. For instance, if your data is flat files get dropped into a directory via FTP or the like, the daemon can monitor the directory for changes and do the necessary work (as above) when stuff arrives.
One thing to consider, if this is a production system, is that doing massive insert/delete/update statements is likely to cause blocking while the transaction is in-flight. Also, a gigantic transaction failing and rolling back has its own disadvantages:
So, depending on your circumstances, you might be better off doing your insert/update/deletes in smaller batches so as to guarantee that you achieve forward progress. 30 million rows over 24 hours works out to be c. 350 per second.
Bulk insert into a holding table then perform either a single Merge statement or an Update and an Insert statement. Either way you want to compare your source table to your holding table to see which action to perform