I am currently working on a phone directory application. For this application I get a flat file (csv) from corporate SAP that is updated daily that I use to update an sql database twice a day using a windows service. Additionally, users can add themselves to the database if they do not exist (ie: is not included in the SAP file). Thus, a contact can be of 2 different types: 'SAP' or 'ECOM'.
So, the Windows service downloads the file from a SAP ftp, deletes all existing contacts in the database of type 'SAP' and then adds all the contacts on the file to the database. To insert the contacts into the database (some 30k), I load them into a DataTable
and then make use of SqlBulkCopy. This works particularly, running only a few seconds.
The only problem is the fact that the primary key for this table is an auto-incremented identity. This means that my contact id's grows at a rate of 60k per day. I'm still in development and my id's are in the area of 20mil:
http://localhost/CityPhone/Contact/Details/21026374
I started looking into reseeding the id column, but if I were to reseed the identity to the current highest number in the database, the following scenario would pose issues:
Also, I frequently query for users based on this this id, so, I'm concerned that making use of something like a GUID instead of an auto-incremented integer will have too high a price in performance. I also tried looking into SqlBulkCopyOptions.KeepIdentity
, but this won't work. I don't get any id's from SAP in the file and if I did they could easily conflict with the values of manually entered contact fields. Is there any other solution to reseeding the column that would not cause the id column values to grow at such an exponential rate?
I suggest following workflow.
Add to your table only changed rows.
Insert Into ContactDetails
(Select *
from tempSAPImport
EXCEPT
SELECT Detail1, Detail2
FROM ContactDetails)
I think your SAP table have a primary key, you can make use of the control if a row updated only.
Update ContactDetails ( XXX your update criteria)
This way you will import your data fast, also you will keep your existing identity values. According to your speed requirements, adding indexes after import will speed up your process.
If SQL Server version >= 2012 then I think the best solution for the scenario above would be using a sequence for the PK values. This way you have control over the seeding process (you can cycle values).
More details here: http://msdn.microsoft.com/en-us/library/ff878091(v=sql.110).aspx