SQL Server Alternative to reseeding identity column

database sqlbulkcopy sql-server sql-server-2012

Question

I am currently working on a phone directory application. For this application I get a flat file (csv) from corporate SAP that is updated daily that I use to update an sql database twice a day using a windows service. Additionally, users can add themselves to the database if they do not exist (ie: is not included in the SAP file). Thus, a contact can be of 2 different types: 'SAP' or 'ECOM'.

So, the Windows service downloads the file from a SAP ftp, deletes all existing contacts in the database of type 'SAP' and then adds all the contacts on the file to the database. To insert the contacts into the database (some 30k), I load them into a DataTable and then make use of SqlBulkCopy. This works particularly, running only a few seconds.

The only problem is the fact that the primary key for this table is an auto-incremented identity. This means that my contact id's grows at a rate of 60k per day. I'm still in development and my id's are in the area of 20mil:

http://localhost/CityPhone/Contact/Details/21026374

I started looking into reseeding the id column, but if I were to reseed the identity to the current highest number in the database, the following scenario would pose issues:

  1. Windows Service Loads 30 000 contacts
  2. User creates entry for himself (id = 30 001)
  3. Windows Service deletes all SAP contacts, reseeds column to after current highest id: 30 002

Also, I frequently query for users based on this this id, so, I'm concerned that making use of something like a GUID instead of an auto-incremented integer will have too high a price in performance. I also tried looking into SqlBulkCopyOptions.KeepIdentity, but this won't work. I don't get any id's from SAP in the file and if I did they could easily conflict with the values of manually entered contact fields. Is there any other solution to reseeding the column that would not cause the id column values to grow at such an exponential rate?

Accepted Answer

I suggest following workflow.

  1. import to brand new table, like tempSAPImport, with your current workflow.
  2. Add to your table only changed rows.

    Insert Into ContactDetails 
       (Select * 
        from tempSAPImport 
        EXCEPT 
        SELECT Detail1, Detail2 
        FROM ContactDetails)
    
  3. I think your SAP table have a primary key, you can make use of the control if a row updated only.

    Update ContactDetails ( XXX your update criteria)
    

This way you will import your data fast, also you will keep your existing identity values. According to your speed requirements, adding indexes after import will speed up your process.


Popular Answer

If SQL Server version >= 2012 then I think the best solution for the scenario above would be using a sequence for the PK values. This way you have control over the seeding process (you can cycle values).

More details here: http://msdn.microsoft.com/en-us/library/ff878091(v=sql.110).aspx



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why