Why do I get high fragmentation when using SqlBulkCopy to move large amounts data between databases?

indexing sql sqlbulkcopy sql-server

Question

Using the code

Using bcp As New SqlBulkCopy(destConnection)
     bcp.DestinationTableName = "myOutputTable"
     bcp.BatchSize = 10000

     bcp.WriteToServer(reader)
End Using

Where reader is an essentially an IDataReader that reads in a table, containing 200k rows or so.

The input table looks like this

CREATE TABLE [dbo].[MyTable](
    [TagIndex] [SMALLINT] NOT NULL,
    [TimeStamp] [DATETIME] NOT NULL,
    [RawQuality] [SMALLINT] NOT NULL,
    [ValQuality] [SMALLINT] NOT NULL,
    [Sigma] [REAL] NULL,
    [Corrected] [REAL] NULL,
    [Raw] [REAL] NULL,
    [Delta] [REAL] NULL,
    [Mean] [REAL] NULL,
    [ScadaTimestamp] [DATETIME] NOT NULL
) ON [PRIMARY

And is ordered by TimeStamp.

The output table is of the same structure and has the following Index (and is empty at the beginning of the process).

CREATE CLUSTERED INDEX [MyOutputTable_Index] ON [dbo].[MyOutputTable]
(
    [TimeStamp] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

Artificially throttling the process to run small(ish) amounts of data into the output table at a time (around < 35k) will result in fragmentation of < 5% and everything is fine.

But if I run in a larger chunk, say 45k, (or let the process run the whole 200k) the fragmentation becomes 99%+.

To be precise if I run in 39,773 I get < 5% fragmentation, but I run in 39,774 I get 99% fragmentation. And if I investigate the pages allocating in the Index I see the following using DBCC PAGE.

FileId  PageId  Row Level   ChildFileId ChildPageId TimeStamp (key)
1       18937   0   1       1           18906       2015-10-22 01:37:32.497
1       18937   1   1       1           18686       2015-10-22 01:38:12.497
1       18937   2   1       1           18907       2015-10-22 01:38:47.497
1       18937   3   1       1           18687       2015-10-22 01:39:27.497
1       18937   4   1       1           18908       2015-10-22 01:40:02.497
1       18937   5   1       1           18688       2015-10-22 01:40:42.497
1       18937   6   1       1           18909       2015-10-22 01:41:17.497
1       18937   7   1       1           18689       2015-10-22 01:41:57.497
1       18937   8   1       1           18910       2015-10-22 01:42:32.497

Looking at the ChildPageId column we can see that the numbers do not run consecutively.

For example, 18906 is followed by 18686 which is followed by 18907 with the series beginning 18686 interleaved with the series beginning with 18906 causing fragmentation of over 99%.

So, the question is what is causing the index to be built like this when running in larger chunks of data?

1
4
5/17/2016 3:48:27 PM

Popular Answer

Its hard to tell without more data but I bet that your timestamp being a clustered index is causing this. Try sorting the data by this field before sending to the output table.

2
5/17/2016 5:15:14 PM


Related Questions





Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow