What is the best approach to import multiple xml files which are in very big number (ex: 30000) with different schema to Sql Server 2008?
I am currently looping through each file, loading data to datatable
and using SqlBulkCopy
to insert the data, but it is taking a lot of time (about 1.5 hours).
This shouldn't take so long. By my estimate, you've got around 600MB of data; you should be able to approach 10MB/s or at least 1MB/s without much difficulty - this means 1-10 minutes should be easily achievable.
What to do:
Without more details, it's hard to be precise, but I can speculate:
SqlBulkCopy
is usually fast, so your insert is likely not the bottleneck. You could do a little faster than a datatable, but it's probably not an issue.DataTable
s can have "indexes"; i.e. primary keys and constraints. These are implemented very inefficiently - these could definitely cause problems.SqlBulkCopy
is fast, it's best with many rows. If you're copying just 1 file per SqlBulkCopy
, that means 30000 calls, and probably at least 30000 fsyncs on the database side. You should be using only one SqlBulkCopy
.XmlDocument
and query it with lots of inefficient loops and/or XPath) you might be running into CPU load issues.With that in mind, I'd look at the following in this order
DataTable
s)SqlBulkCopy
instances - use only one (per thread)The order is inspired by how hard it is to check for this problem. Disk load is unlikely to be problematic, but it's trivial to check, so you might as well start by eliminating that possibility. Database schema issues aren't that unlikely, but they're much more work to identify (which index is it, and am I impacting another workflow by removing it?) so I'd check those last.