I need to create fake data into
fakeData table that follows this psuedocode:
foreach(t1.id in table1) foreach(t2.id in table2) foreach(t3.id in table3) INSERT INTO fakeData (t1.id, t2.id, t3.id, random(30,80))
Where the id is the primary key of that table.
I need to do this as fast as possible, because I plan to insert billions of records. I am not sure if using SQL to do the statements is the best method, or using c# or what the best option is for getting this data into the table.
This question really has two parts, how do I execute the psuedocode in SQL Server, and what is the best way to do this really fast. ( I currently have no indices setup )
This may seem like a duplicate of all the other "Fastest way to bulk insert". I think this question is different because the data I am loading can actually be generated my SQL Server, so a BULK generate compared to BULK INSERT
PS: I got SQL Server 2012
Edit: More data
This is a star schema. fakeData will be the fact table.
table2 is a date dimension of 20 years, with 7300 records. table3 is a time dimension of 96 records. table1 is another dimension with 100 million records.
Ok well... Since none really showed how to do random values as well. Ill contribute my solution so far. I am doing this right now, along with recovery model simple :
BEGIN TRAN declare @x int = 1 while @x <= 5000 begin INSERT INTO dimSpeed Select T1.id as T1ID, T2.DateValue as T2ID, T3.TIME_ID as T3ID, ABS(Checksum(NewID()) % 70) + 20 From lines T1, dimDate T2, dimTime T3 WHERE T1.id = @x AND T2.DateValue > '1/1/2015' AND T2.DateValue < '1/1/2016' if (@x % 100) = 0 begin COMMIT TRAN BEGIN TRAN end set @x = @x + 1 end COMMIT TRAN
5000 is how many elements of TABLE1 (t1) I am inserting. Doing just 5000 takes 5 minutes or so. At this rate it will take 70 days to insert all the data I need. A speedier option is needed for sure