Inserting many rows in EF too slow, how to get primary key from SqlBulkCopy?

entity-framework sql sqlbulkcopy

Question

We have a use case in our application where a user triggers a request that will cause an insert of 100 to 1000 rows.
After that insert, we need the object to continue processing and create more objects which are foreign keys to the originally inserted objects, or in other words, we need the Primary Key ID of the inserted objects.

So far we have used EF to do that in a foreach loop, this was too slow, and was taking about 15-20 seconds to complete for about 600 rows. (while blocking the user, bad :( )

Original code (also handles updates, but we do not care about performance there, it is not blocking users):

foreach (Location updatedLoc in locationsLoaded)
{
    // find it in the collection from the database
    Location fromDb = existingLocations.SingleOrDefault(loc => loc.ExtId.Equals(updatedLoc.ExtId));

    // update or insert
    if (fromDb != null)
    {
        // link ids for update
        updatedLoc.Id = fromDb.Id;

        // set values for update 
        db.Entry(fromDb).CurrentValues.SetValues(updatedLoc);
    }
    else
    {
        System.Diagnostics.Trace.WriteLine("Adding new location: " + updatedLoc.Name, "loadSimple");

        // insert a new location <============ This is the bottleneck, takes about 20-40ms per row
        db.Locations.Add(updatedLoc);
    }
}

// This actually takes about 3 seconds for 600 rows, was actually acceptable
db.SaveChanges();

So after researching on SO and internet, I found out I was using EF the wrong way and need to use SqlBulkCopy

And thus the code was rewritten, and what used to take ~20 seconds, now takes ~100ms (!)

foreach (Location updatedLoc in locationsLoaded)
{
    // find it in the collection from the database
    Location fromDb = existingLocations.SingleOrDefault(loc => loc.ExtId.Equals(updatedLoc.ExtId));

    // update or insert
    if (fromDb != null)
    {
        // link ids for update
        updatedLoc.Id = fromDb.Id;

        // set values for update
        db.Entry(fromDb).CurrentValues.SetValues(updatedLoc);
    }
    else
    {
        System.Diagnostics.Trace.WriteLine("Adding new location: " + updatedLoc.Name, "loadSimple");

        // insert a new location
        dataTable.Rows.Add(new object[] { \\the 14 fields of the location.. });
    }
}

System.Diagnostics.Trace.WriteLine("preparing to bulk insert", "loadSimple");

// perform the bulk insert
using (var bulkCopy = new System.Data.SqlClient.SqlBulkCopy(System.Configuration.ConfigurationManager.ConnectionStrings["bulk-inserter"].ConnectionString))
{
    bulkCopy.DestinationTableName = "Locations";

    for (int i = 0; i < dataTable.Columns.Count; i++)
    {
        bulkCopy.ColumnMappings.Add(i, i + 1);
    }

    bulkCopy.WriteToServer(dataTable);
}

// for update
db.SaveChanges();

The problem is, after the bulk copy, the objects in the Locations collection, which is part of the EF ORM are not changed (that is okay and expected), but I need the inserted ids to continue work on these objects.

A simple solution would be to immediately select the data again from the database, I have a the data at hand, I can simply re-select it into a different collection.

But that solution feels incorrect, is there no way to get the ids as part of the insert.

EDIT: the simple solution works, see accepted answer below on how to easily sync it back to EF.

Maybe I should not use SqlBulkCopy (i expect up to about 1000 rows, no more) and use something else?

Please note, A few related SO questions and solutions, all seem to move away from EF..

  1. Possible to get PrimayKey IDs back after a SQL BulkCopy?
  2. Improving bulk insert performance in Entity framework
  3. Fastest Way of Inserting in Entity Framework (this is about SaveChanges() performance with many pending inserts, should call it every X inserts and not at the end of the processing with 1000s pending)

Accepted Answer

Nothing you do through the EF will ever be as fast as SqlBulkCopy. Indeed, raw SQL INSERTs aren't as fast. So you just need to re-read Locations. Refresh the query by re-reading with MergeOption.OverwriteChanges.


Popular Answer

If you are using SQL-Server 2008 or later you could use a stored procedure to do what you are after. You would need to define a TYPE that is the same as your data table in SQL:

CREATE TYPE dbo.YourType AS TABLE (ID INT, Column1 INT, Column2 VARCHAR(5)...)

Then pass this type to a stored procedure.

CREATE PROCEDURE dbo.InsertYourType (@YourType dbo.YourType READONLY)
AS
    BEGIN
        DECLARE @ID TABLE (ID INT NOT NULL PRIMARY KEY)
        INSERT INTO YourTable (Column1, Column2...)
        OUTPUT inserted.ID INTO @ID
        SELECT  Column1, Column2...
        FROM    @YourType

        SELECT  *
        FROM    YourTable
        WHERE   ID IN (SELECT ID FROM @ID)

    END

This will capture the ID for the inserted rows, and return all new rows. As long as your c# datatable conforms to the format of dbo.YourType you can pass this in the same way you would normally pass a parameter to an SqlCommand.

SqlCommand.Parameters.Add("@YourType", YourDataTable)

I realise this is similar to your proposal of reselecting the data, but the select should be fast as it is only using the identity column. While you still have the problem of using an SQL insert rather than the bulk copy, you are reverting to a more set based solution rather than the procedural based solution of EF. This is quite similar to the accepted answer in one of the links you have posted, but I've removed a couple of stages with the use of a table variable.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why