SQL Server index behaviour when doing bulk insert

bulkinsert indexing sqlbulkcopy sql-server

Question

I have an application that inserts multiple rows at once into SQL Server.

I use either SqlBulkCopy class or self-written code that generates a gigantic insert into table_name(...) values (...) statement.

My table has several indexes and a clustered one.

The question is: how are those indexes updated? For each row I insert? For each transaction?

Somewhat odd question - is there a general term for this scenario, like 'bulk-insert indexing behaviour'? I tried to google several keyword combinations, haven't found anything. The reason I ask is because I sometimes do work with Postgres and would like to know its behaviour as well.

I've been trying to find an article on this topic, several times, without any luck.

If you can point me to any docs, article or a book with a relevant chapter, that'd be great

Accepted Answer

You can see how indexes are updated by examining the query plan. Consider this heap table with only non-clustered indexes.

CREATE TABLE dbo.BulkInsertTest(
      Column1 int NOT NULL
    , Column2 int NOT NULL
    , Column3 int NOT NULL
    , Column4 int NOT NULL
    , Column5 int NOT NULL
    );
CREATE INDEX BulkInsertTest_Column1 ON dbo.BulkInsertTest(Column1);
CREATE INDEX BulkInsertTest_Column2 ON dbo.BulkInsertTest(Column2);
CREATE INDEX BulkInsertTest_Column3 ON dbo.BulkInsertTest(Column3);
CREATE INDEX BulkInsertTest_Column4 ON dbo.BulkInsertTest(Column4);
CREATE INDEX BulkInsertTest_Column5 ON dbo.BulkInsertTest(Column5);
GO

Below is the execution plan for a singleton INSERT.

INSERT INTO dbo.BulkInsertTest(Column1, Column2, Column3, Column4, Column5) VALUES
     (1, 2, 3, 4, 5);

INSERT execution plan

The execution plan shows only the Table Insert operator so the new non-clustered index rows were inserted intrinsically during table insert operation itself. A large batch of singleton INSERT statements will yield this same plan for each insert statement.

I get a similar plan with a single INSERT statement with a large number of rows specified via a row constructor, with the only difference being addition of a Constant Scan operator to emit the rows.

INSERT INTO dbo.BulkInsertTest(Column1, Column2, Column3, Column4, Column5) VALUES
     (1, 2, 3, 4, 5)
    ,(1, 2, 3, 4, 5)
    ,(1, 2, 3, 4, 5)
    ,...
    ,(1, 2, 3, 4, 5);

enter image description here

Here's the execution plan for a T-SQL BULK INSERT statement (using a dummy empty file as the source). With the BULK INSERT, SQL Server added additional query plan operators to optimize the index inserts. The rows were spooled after inserting into the table, and then rows from the spool sorted and inserted into each index separately as a mass insert operation. This method reduces the overhead for large insert operations. You may also see similar plans for INSERT...SELECT queries.

BULK INSERT dbo.BulkInsertTest
    FROM 'c:\Temp\BulkInsertTest.txt';

BULK INSERT execution plan

I verified that SqlBulkCopy generates the same execution plan as a T-SQL BULK INSERT by capturing the actual plans with an Extended Event trace. Below is the trace DDL and PowerShell script I used.

Trace DDL:

CREATE EVENT SESSION [SqlBulkCopyTest] ON SERVER 
ADD EVENT sqlserver.query_post_execution_showplan(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[equal_i_sql_unicode_string]([sqlserver].[client_app_name],N'SqlBulkCopyTest') 
        AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'insert bulk%') 
        ))
ADD TARGET package0.event_file(SET filename=N'SqlBulkCopyTest');
GO

PowerShell script:

$connectionString = "Data Source=.;Initial Catalog=YourUserDatabase;Integrated Security=SSPI;Application Name=SqlBulkCopyTest"

$dt = New-Object System.Data.DataTable;
$null = $dt.Columns.Add("Column1", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column2", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column3", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column4", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column5", [System.Type]::GetType("System.Int32"))

$row = $dt.NewRow()
[void]$dt.Rows.Add($row)
$row["Column1"] = 1
$row["Column2"] = 2
$row["Column3"] = 3
$row["Column4"] = 4
$row["Column5"] = 5

$bcp = New-Object System.Data.SqlClient.SqlBulkCopy($connectionString)
$bcp.DestinationTableName = "dbo.BulkInsertTest"
$bcp.WriteToServer($dt)

EDIT

Credit to Vladimir Baranov for providing this blog article by Microsoft Data Platform MVP Paul White, which details SQL Server's cost-based index maintenance strategy.

EDIT 2

I see from your revised question that your actual situation is a table with a clustered index rather than a heap. The plans will be similar to that of the heap examples above, except of course, that data will be inserted using a Clustered Index Insert operator instead of a Table Insert.

An ORDER hint may be specified during bulk insert operations into a table with a clustered index. When the specified order matches that of the clustered index, SQL Server can eliminate the sort operator before the Clustered Index Insert since it assumes data are already sorted per the hint. Unfortunately, SqlBulkCopy does not support the ORDER hint.


Popular Answer

The question is: how are those indexes updated? For each row I insert? For each transaction?

From a low level point of view indexes are always updated row by row, this is a consequence of the indexes internal data structure. SQL Server indexes are B+ trees. There is no algorithm to update several rows in a B+ tree index all at once, you need to update them one by one because you cannot know in advance where will go one row before updating o inserting the previous rows.

However from a transactional point of view, indexes are updated all at once, this is because SQL Server implement transactional semantics. On the default isolation level READ COMMITTED, another transaction cannot see the rows (index or table rows) you inserted in the bulk insert operation until the transaction is committed. So it appears as the rows were inserted all at once.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why