SQL Server index behaviour when doing bulk insert

bulkinsert indexing sqlbulkcopy sql-server

Question

I have a program that simultaneously inserts numerous rows into SQL Server.

I either useSqlBulkCopy a class or personal code that produces a massiveinsert into table_name(...) values (...) statement.

My table contains a clustered index in addition to numerous others.

How are those indexes updated, I wonder? each row that I insert? every transaction?

I have a weird question: Is there a generic phrase, like "bulk-insert indexing behavior," for this situation? Numerous keyword combinations were explored, but nothing was discovered. I'm asking since I sometimes deal with Postgres and would want to understand its behavior.

I've looked for an article on this subject numerous times without success.

It would be fantastic if you could direct me to any documents, articles, or books that include a relevant chapter.

1
4
8/26/2018 7:23:26 PM

Accepted Answer

Examining the query plan will allow you to observe how index updates are performed. Think about the heap table below, which just has non-clustered indexes.

CREATE TABLE dbo.BulkInsertTest(
      Column1 int NOT NULL
    , Column2 int NOT NULL
    , Column3 int NOT NULL
    , Column4 int NOT NULL
    , Column5 int NOT NULL
    );
CREATE INDEX BulkInsertTest_Column1 ON dbo.BulkInsertTest(Column1);
CREATE INDEX BulkInsertTest_Column2 ON dbo.BulkInsertTest(Column2);
CREATE INDEX BulkInsertTest_Column3 ON dbo.BulkInsertTest(Column3);
CREATE INDEX BulkInsertTest_Column4 ON dbo.BulkInsertTest(Column4);
CREATE INDEX BulkInsertTest_Column5 ON dbo.BulkInsertTest(Column5);
GO

The singleton's action plan is shown below.INSERT .

INSERT INTO dbo.BulkInsertTest(Column1, Column2, Column3, Column4, Column5) VALUES
     (1, 2, 3, 4, 5);

INSERT execution plan

The new non-clustered index rows were organically added during the table insert operation as the execution plan only displays the Table Insert operator. This similar strategy will result from a large batch of singleton insert statements.

The only change is the inclusion of a Constant Scan operator to emit the rows in a single INSERT statement with a lot of rows supplied by a row constructor.

INSERT INTO dbo.BulkInsertTest(Column1, Column2, Column3, Column4, Column5) VALUES
     (1, 2, 3, 4, 5)
    ,(1, 2, 3, 4, 5)
    ,(1, 2, 3, 4, 5)
    ,...
    ,(1, 2, 3, 4, 5);

enter image description here

The T-SQL execution strategy is shown below.BULK INSERT statement (using a dummy empty file as the source). Using theBULK INSERT In order to optimize the index inserts, SQL Server incorporated more query plan operators. After being placed into the table, the rows were spooled, and then as a bulk insert operation, the rows from the spool were sorted and inserted into each index independently. Large insert operations' overhead is decreased with this technique. You could come across similar designs forINSERT...SELECT queries.

BULK INSERT dbo.BulkInsertTest
    FROM 'c:\Temp\BulkInsertTest.txt';

BULK INSERT execution plan

I confirmed thatSqlBulkCopy produces an execution plan similar to a T-SQL plan.BULK INSERT by using an Extended Event trace to record the actual plans. The PowerShell script I used to trace DDL is shown below.

Follow DDL:

CREATE EVENT SESSION [SqlBulkCopyTest] ON SERVER 
ADD EVENT sqlserver.query_post_execution_showplan(
    ACTION(sqlserver.client_app_name,sqlserver.sql_text)
    WHERE ([sqlserver].[equal_i_sql_unicode_string]([sqlserver].[client_app_name],N'SqlBulkCopyTest') 
        AND [sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'insert bulk%') 
        ))
ADD TARGET package0.event_file(SET filename=N'SqlBulkCopyTest');
GO

Script in PowerShell

$connectionString = "Data Source=.;Initial Catalog=YourUserDatabase;Integrated Security=SSPI;Application Name=SqlBulkCopyTest"

$dt = New-Object System.Data.DataTable;
$null = $dt.Columns.Add("Column1", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column2", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column3", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column4", [System.Type]::GetType("System.Int32"))
$null = $dt.Columns.Add("Column5", [System.Type]::GetType("System.Int32"))

$row = $dt.NewRow()
[void]$dt.Rows.Add($row)
$row["Column1"] = 1
$row["Column2"] = 2
$row["Column3"] = 3
$row["Column4"] = 4
$row["Column5"] = 5

$bcp = New-Object System.Data.SqlClient.SqlBulkCopy($connectionString)
$bcp.DestinationTableName = "dbo.BulkInsertTest"
$bcp.WriteToServer($dt)

EDIT

Vladimir Baranov deserves praise for giving this blog post by Paul White, a Microsoft Data Platform MVP, which describes the cost-based index maintenance approach used by SQL Server.

EDIT 2

Your amended query indicates that, as opposed to a heap, your true scenario is a table with a clustered index. Similar to the heap examples from before, the plans will only differ in that data will be entered using a Clustered Index Insert operator rather than a Table Insert.

An ORDER During bulk insert operations into a table with a clustered index, hint might be supplied. Since it considers that the data are already sorted in accordance with the hint, SQL Server may skip the sort operator before the Clustered Index Insert when the stated order matches that of the clustered index. Unfortunately,SqlBulkCopy does not adhere to theORDER hint.

4
8/26/2018 9:58:51 PM

Popular Answer

The question is: how are those indexes updated? For each row I insert? For each transaction?

Indexes are constantly updated row by row from a low level perspective because of the underlying data structure of the index. The indexes in SQL Server are B+ trees. Because it is impossible to predict where a row will go before updating or adding the prior rows, there is no technique that allows you to change several rows in a B+ tree index at once; instead, you must update each row individually.

However, since SQL Server has transactional semantics, from a transactional perspective, indexes are updated all at once. The rows (index or table rows) you put in the bulk insert operation are hidden from other transactions until the transaction is committed when using the default isolation level READ COMMITTED. The rows thus seem to have been added all at once.



Related Questions





Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow