SQL Bulk insert with parent/child relationships, is order preserved?

c# sqlbulkcopy sql-server-2008

Question

Similar to these other questions noted below, I have two tables with the structure:

create table parent (
   recno int identity(1,1) primary key not null,
   groupCode int,
   parentdata varchar(80)
);

create table child (
   parentrecno int not null,
   childdata varchar(80)
)

I need to insert a few hundred thousand records quickly into these tables -- and the tables hold millions of other records unrelated to this insert and are never quiet. Because of the parent/child nature, it's not a good candidate (it seems) for SqlBulkCopy.

In C# using SqlCommand with INSERT I'm getting about 400-500 records/second inserted, and this is a bit too slow. Pseudocode:

 foreach(Record r in parentRecords)
 {
      Insert Fields from r into SqlCommand Parameters but not "recno"
      Call ExecuteScalar to insert and fetch the inserted identity value (recno)
      foreach(ChildRecord cr in parentRecords.Children)
      {
          Insert Fields from cr into SqlCommand Parameters
          Insert the identity value (recno) from above into Parameters 
                                                       (as parentrecno)
          Call ExecuteNonQuery to insert the record
      }   
 }

After reading those other posts, a though occurred to me. The groupCode attached to the parent records is unique to that set of parent records I'm inserting. Would it work to:

  1. Bulk insert the parent records with SqlBulkCopy, letting the insert auto-generate the recno identity field as usual.
  2. Perform a SELECT on just the inserted records:

    select recno from parent where groupCode = @thisgroup order by recno;
    
  3. Use the retrieved values to fill in the parentrecno fields for the child records in memory

  4. Bulk insert the child records with SqlBulkCopy

This would rely on the parent records going into the SQL table in the same order as they are in the original DataTable (and the identity values being assigned in that same order). Is this something I can rely on?

Related questions:

How to update Dataset Parent & Child tables with Autogenerated Identity Key?

SqlBulkCopy and DataTables with Parent/Child Relation on Identity Column

Accepted Answer

Create two staging tables with the same structure as your target tables but don't use identity on recno column.

create table parentTmp (
   recno int,
   groupCode int,
   parentdata varchar(80)
);

create table childTmp (
   parentrecno int not null,
   childdata varchar(80)
)

Bulk load your data to the staging tables, keeping the recno/parentrecno values as is.

Then you can use merge and output to move the data from the staging tables.

-- Table variable to hold mapping between 
-- SourceRecno and TargetRecno
declare @recno table(SourceRecno int, TargetRecno int);

-- Merge data from parentTmp to parent
-- Output old and new recno to @recno
merge parent T
using parentTmp S
on 0=1
when not matched then
  insert (groupCode, parentdata)
    values (S.groupCode, S.parentData)
output S.recno, inserted.recno into @recno;

-- Copy data from childTmp to child
-- Use @recno to get the new recno
insert into child(parentrecno, childdata)
select R.TargetRecno, C.childdata
from childTmp as C
  inner join @recno as R
    on C.parentrecno = R.SourceRecno;

This will only work in SQL Server 2008 (and later I presume).


Popular Answer

This is not an absolut bulk insert, but instead it inserts all child data at the same time with the parent data, making only 1 round trip to the DB.

insert into parent(groupcode, parentdata) values(1, 'parent data');
insert into child(parentrecno, childdata) select parentrecno, childdata from (
    select SCOPE_IDENTITY() as parentrecno, 'child data 1' as childdata
    union
    select SCOPE_IDENTITY() as parentrecno, 'child data 2' as childdata
    union
    select SCOPE_IDENTITY() as parentrecno, 'child data 3' as childdata
) childrendata;

You can build scripts like this in your C# code, and then perform one request per parent.

Be aware that this may not be a good approach if the amount of child data is known to be large. Don't know the details, but I'm sure that the size of the sql script can't grow indefinitely.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why