SqlBulkCopy XML with a GUID foreign key

c# guid sqlbulkcopy uniqueidentifier xml

Question

I am tryin to insert data using XML files and SqlBulkCopy. The destination table is a time series table set up as below

create table TimeSeries (
    Id          uniqueidentifier constraint DF_TimeSeries_Id default (newid()) not null,
    ObjectId    uniqueidentifier not null,
    [Date]      datetime not null,
    Value       float(53) not null,
    [Type]      varchar (4) not null,
    [Source]    varchar (4) not null,
    LastUpdate  datetime constraint DF_TimeSeries_LastUpdate default (getdate()) not null,
    TypeIndex   smallint constraint DF_TimeSeries_TypeIndex default (0) not null,
    constraint PK_TimeSeries primary key clustered ([Date] asc, ObjectId asc, [Type] asc, [Source] asc, TypeIndex asc) with (fillfactor = 80)
);

go
create nonclustered index [IX_TimeSeries_ObjectId_Type_Date_Source]
    on TimeSeries(ObjectId asc, [Type] asc, [Date] asc, [Source] asc)
    include(Value) with (fillfactor = 80);


go
create nonclustered index [IX_TimeSeries_ObjectId_Date]
    on TimeSeries(ObjectId asc, [Date] asc)
    include(Value) with (fillfactor = 80);

go
create table Beacons
(
    BeaconId uniqueidentifier not null default newid(), 
    [Description] varchar(50) not null, 
    LocationX float not null,
    LocationY float not null,
    Altitude float not null
    constraint PK_Beacons primary key clustered (BeaconId)
)
go
create index IX_Beacons on Beacons (BeaconId)

go
create table SnowGauges
(
    SnowGaugeId uniqueidentifier not null default newid(), 
    [Description] varchar(50) not null
    constraint PK_SnowGauges primary key clustered (SnowGaugeId)
)
go
create index IX_SnowGauges on SnowGauges (SnowGaugeId)


go
insert into Beacons ([Description], LocationX, LocationY, Altitude)
values ('Dunkery', 51.162, -3.586, 519), ('Prestwich', 53.527, -2.279, 76)
insert into SnowGauges ([Description]) values ('Val d''Isère')

select * from Beacons
select * from SnowGauges

As you can see, I want to store in TimeSeries any kind of time series. This can be temperature, pressure, biologic data etc... In any case, I can identify the time series by a unique identifier, a source and a type. There are no foreign key set up in ObjectId because this unique identifier can refer to any table.

At the end of this script I inserted 2 beacons and one snowgauge and I want to fill their time series. The XML file to do so has this format:

<?xml version="1.0" encoding="utf-8" ?>
<TimeSeries>
<TimeSeries ObjectId="186CA33E-AC1C-4220-81DE-C7CD32F40C1A" Date="09/06/2013 07:00:00" Value="9.2" Source = "Met Office" Type = "Temperature"/>
<TimeSeries ObjectId="186CA33E-AC1C-4220-81DE-C7CD32F40C1A" Date="09/06/2013 10:00:00" Value="8.8" Source = "Met Office" Type = "Temperature"/>
<TimeSeries ObjectId="186CA33E-AC1C-4220-81DE-C7CD32F40C1A" Date="09/06/2013 13:00:00" Value="8.7" Source = "Met Office" Type = "Temperature"/>
<TimeSeries ObjectId="186CA33E-AC1C-4220-81DE-C7CD32F40C1A" Date="09/06/2013 07:00:00" Value="1" Source = "Met Office" Type = "UV"/>
<TimeSeries ObjectId="186CA33E-AC1C-4220-81DE-C7CD32F40C1A" Date="09/06/2013 10:00:00" Value="3" Source = "Met Office" Type = "UV"/>
<TimeSeries ObjectId="186CA33E-AC1C-4220-81DE-C7CD32F40C1A" Date="09/06/2013 13:00:00" Value="5" Source = "Met Office" Type = "UV"/>
<TimeSeries ObjectId="AFB81E51-18B0-4696-9C2F-E6E9EEC1B647" Date="09/06/2013 07:00:00" Value="5.8" Source = "Met Office" Type = "Temperature"/>
<TimeSeries ObjectId="AFB81E51-18B0-4696-9C2F-E6E9EEC1B647" Date="09/06/2013 10:00:00" Value="6.3" Source = "Met Office" Type = "Temperature"/>
<TimeSeries ObjectId="AFB81E51-18B0-4696-9C2F-E6E9EEC1B647" Date="09/06/2013 13:00:00" Value="6.5" Source = "Met Office" Type = "Temperature"/>
<TimeSeries ObjectId="50E52A2B-D719-4341-A451-110D0874D26D" Date="07/06/2013 00:00:00" Value="80.5" Source = "Meteo France" Type = "SnowMeter"/>
<TimeSeries ObjectId="50E52A2B-D719-4341-A451-110D0874D26D" Date="08/06/2013 00:00:00" Value="80.5" Source = "Meteo France" Type = "SnowMeter"/>
</TimeSeries>

If you run the first script you can expect to have different ObjectId and will have to update them in the XML file. So from there, everything should be straight forward, and a simple C# program should do the job to insert the data. Let's have a look at the C# Code now:

using System;
using System.Data;
using System.Data.SqlClient;
using System.IO;

namespace XMLBulkInsert
{
    class Program
    {
        const string XMLFILE_PATH = @"C:\Workspaces\Ws1\R\TimeSeries\TimeSeries.xml";
        const string CONNECTION_STRING = @"Server=RISK1;Database=DevStat;Trusted_Connection=True;";

        static void Main(string[] args)
        {
            StreamReader xmlFile = new StreamReader(XMLFILE_PATH);
            DataSet ds = new DataSet();

            Console.Write("Read file... ");
            ds.ReadXml(xmlFile);
            DataTable sourceData = ds.Tables[0];
            Console.WriteLine("Done !");

            using (SqlConnection sourceConnection = new SqlConnection(CONNECTION_STRING))
            {
                sourceConnection.Open();
                using (SqlBulkCopy bulkCopy = new SqlBulkCopy(sourceConnection.ConnectionString))
                {
                    bulkCopy.ColumnMappings.Add("ObjectId", "ObjectId");
                    bulkCopy.ColumnMappings.Add("Date", "Date");
                    bulkCopy.ColumnMappings.Add("Value", "Value");
                    bulkCopy.ColumnMappings.Add("Source", "Source");
                    bulkCopy.ColumnMappings.Add("Type", "Type");
                    bulkCopy.DestinationTableName = "TimeSeries";

                    try
                    {
                        Console.Write("Insert data... ");
                        bulkCopy.WriteToServer(sourceData);
                    }
                    catch (Exception ex)
                    {
                        Console.WriteLine(ex.Message);
                    }
                    finally
                    {
                        xmlFile.Close();
                        sourceConnection.Close();
                    }
                }
            }

            Console.WriteLine("Insertion completed, please Press Enter...");
            Console.ReadLine();
        }

    }
}

Running this programm returns this exception: "The given value of type String from the data source cannot be converted to type uniqueidentifier of the specified target column.". It looks like there is no way when I set up the mapping to force the column to be a uniqueidentifier. I even tried to insert this code ds.Tables[0].Columns[0].DataType = typeof(Guid); but with no success, .Net can not change the type of column once the table has row data.

I had high exception with SQlBulkCopy but now I feel a bit stuck. I have millions and millions of data in XML format and can't insert any of them because of this uniqueidentifier.

Does anyone know how to set up this class in order to accept a unique identifier?

Accepted Answer

Given the comments about 300 million rows, I would forget about DataTable; you don't want to load that data all at once. The ideal would be to parse it element-by-element, exposing the data as an IDataReader.

Fortunately, some utilities for that exist. Firstly, let's parse the data out. Each row is essentially:

class TimeSeries
{
    public Guid ObjectId { get; set; }
    public DateTime Date { get; set; }
    public string Source { get; set; }
    public string Type { get; set; }
    public decimal Value { get; set; }
}

and we can write an element-based reader like:

static IEnumerable<TimeSeries> ReadTimeSeries(TextReader source)
{
    using (var reader = XmlReader.Create(source, new XmlReaderSettings {
                     IgnoreWhitespace = true }))
    {
        reader.MoveToContent();
        reader.ReadStartElement("TimeSeries");
        while(reader.Read() && reader.NodeType == XmlNodeType.Element
                    && reader.Depth == 1)
        {
            using (var subtree = reader.ReadSubtree())
            {
                var el = XElement.Load(subtree);
                var obj = new TimeSeries
                {
                    ObjectId = (Guid) el.Attribute("ObjectId"),
                    // note: datetime is not xml format; need to parse - this
                    // should probably be more explicit
                    Date = DateTime.Parse((string) el.Attribute("Date")),
                    Source = (string) el.Attribute("Source"),
                    Type = (string)el.Attribute("Type"),
                    Value = (decimal)el.Attribute("Value")
                };
                yield return obj;
            }
        }
    }
}

Note that this is an "iterator block", and is lazily spooling - it doesn't load all the data at once.

Next we need an API that can consume an IEnumerable<T> and expose it as IDataReader - FastMember does exactly this (and many other things). So we can then just write:

using(var bcp = new SqlBulkCopy(connection))
using(var objectReader = ObjectReader.Create(ReadTimeSeries(source)))
{
    bcp.DestinationTableName = "SomeTable";
    bcp.WriteToServer(objectReader);
}

where source is a TextReader, for example from File.OpenText:

using(var source = File.OpenText(path))
using(var bcp = new SqlBulkCopy(connection))
using(var objectReader = ObjectReader.Create(ReadTimeSeries(source)))
{
    bcp.DestinationTableName = "SomeTable";
    bcp.WriteToServer(objectReader);
}

If you want to control the column order, you can use bcp.ColumnMappings - but perhaps more convenient is to get the IDataReader to do it internally:

using(var objectReader = ObjectReader.Create(
    ReadTimeSeries(source, "ObjectId", "Date", "Value" /* etc */)))
{
    bcp.DestinationTableName = "SomeTable";
    bcp.WriteToServer(objectReader);
}

I use this for some of my own code - even when the data does fit in memory, it is much faster than going via DataTable.

The key point, though, is that we are now in control of what is going on.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why