[note: needs to be in code as can't use SSIS or similar]
I need to bulk copy data from one database to another using C# and EF probably - though this isn't cast in stone.
The problem is that the source data is all in varchar(max) and I want the destination in correct data types. The source is historical from an old ETL job that works very well and I can't get replaced. The most common issue I have seen is alpha's in numeric fields - e.g. "none" in a money field. These are fine in the source since it's all varchar.
I'd like to copy the data and validate it:
source -> validate -> destination
in the simplest way possible. If validation fails then I need to know the exact row that failed (and ideally WHAT failed) so that it can be manually fixed in the source, and the data re-copied.
There are around 50 tables ranging between 10 and 1.7M rows! So speed is important as well.
What would be a sensible way to approach this? Create DTO's, validation attributes and automap? Two EF entities and map across row by row and validate each? SPROC and manual insert?
Do it in T-SQL with a linked server.
I.e.:
--begin a transaction to wrap validation and load
BEGIN TRAN
--Validate that no tickets are set to closed without a completion date
SELECT *
FROM bigTableOnLocalServer with (TABLOCKX) -- prevent new rows
WHERE ticketState = '1' /* ticket closed */ and CompletionDate = 'open'
--if validation fails, quit the transaction to release the lock
COMMIT TRAN
--if no rows in result set 1, execute the load
INSERT INTO RemoteServerName.RemoteServerDBName.RemoteSchema.RemoteTable (field1Int, Field2Money, field3text)
SELECT CAST(Field1 as int),
CASE Field2Money WHEN 'none' then null else CAST(Field2Money as money) END,
Field3Text
FROM bigTableOnLocalServer
WHERE recordID between 1 and 1000000
-- after complete, commit the transaction to release the lock
COMMIT TRAN
If you cannot communicate directly between the servers, still do the validation in SQL, but use a C# client to write the data to disk and hit the Bulk insert function on the destination server. Since the C# component would do nothing more than transport the data, I would just go straight to a format usable by BULK INSERT, Ã la CSV.