Importing Data in Parallel in SQL Server

bulkinsert bulk-load parallel-processing sqlbulkcopy sql-server

Question

I have more than 100 files to import into the sql server and most of them are of 500 MB. I want to leverage the the parallel import utility of SQL server and have read a number of webpages, like the following ones:

How to load 1 TB data in 30 minutes

https://technet.microsoft.com/en-us/library/dd537533(v=sql.100).aspx

Importing Data in Parallel with Table Level Locking

https://technet.microsoft.com/en-us/library/ms186341(v=sql.105).aspx

Controlling Locking Behavior for Bulk Import

https://technet.microsoft.com/en-us/library/ms180876(v=sql.105).aspx

and the answers in stackoverflow

Fastest way to insert in parallel to a single table

However, none of them have given a simple example with code. I know how to use bulk insert/bcp, but I don't know where to start with parallel import? Can anyone help me with it?

My system is Windows and I'm using SQL server 2016. The source data file is in txt format.

Thanks in advance for your help!

Jason

Accepted Answer

  1. Load the file path details into a tracking table

    Create table FileListCollection TABLE (Id int identity(1,1), filepath VARCHAR(500), ThreadNo tinyint, isLoaded int)

    DECLARE @FileListCollection TABLE (filepath VARCHAR(500)) DECLARE @folderpath NVARCHAR(500) DECLARE @cmd NVARCHAR(100) SET @folderpath = '' SET @cmd = 'dir ' + @folderpath + ' /b /s'

    INSERT INTO @FileListCollection EXECUTE xp_cmdshell @cmd

    DELETE FROM @FileListCollection WHERE filepath IS NULL

    insert into FileListCollection(filepath, isLoaded) select filepath, 0 from @FileListCollection

  2. Schedule for each thread

    declare @ThreadNo int = 3 update f set ThreadNo=(id%@ThreadNo) from FileListCollection f

  3. Open three session and assign thread number to each

  4. Run the below script to load the data

    DECLARE @filepath NVARCHAR(500) DECLARE @filepath NVARCHAR(500) DECLARE @bcpquery NVARCHAR(MAX); DECLARE @ThreadNo int = 1 WHILE EXISTS ( SELECT TOP 1 * FROM FileListCollection where ThreadNo = @ThreadNo and isLoaded = 0 ) BEGIN SELECT TOP 1 @filepath = filepath FROM FileListCollection where ThreadNo = @ThreadNo and isLoaded = 0

    SET @bcpquery = 'bulk insert <Database>.dbo.Table from '''+ @filepath+''' with (fieldterminator = ''|'', rowterminator = ''\n'')';
    
    print @bcpquery
    --Load the Content in table
    execute sp_executesql @bcpquery;
    
    Update FileListCollection set isLoaded = 1
    WHERE filepath = @filepath
    

    END




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why