I must enter fictitious data into
following table contains the pseudocode:
foreach(t1.id in table1) foreach(t2.id in table2) foreach(t3.id in table3) INSERT INTO fakeData (t1.id, t2.id, t3.id, random(30,80))
where the table's primary key is the id.
I want to input billions of records, thus I need to do this as quickly as I can. I'm not sure what the best choice is for inserting this data into the database is—if using SQL or C# are the best ways to execute the commands.
How do I run the pseudocode in SQL Server, and what is the best approach to accomplish this relatively quickly, are basically the two components of this query. (I haven't built up any indices as of yet.)
This could seem to be a repetition of every other "Fastest approach to bulk insert." As opposed to a bulk insert, I believe this query is different since the data I'm importing may really be created by my SQL Server.
PS: I have 2012 SQL Server
A star schema is this. The fact table will be fakeData.
Table 2 has 7300 items with a date dimension of 20 years. Table 3 has 96 items in its temporal dimension. Another dimension with 100 million records is table1.
OK, then... Given that none actually shown how to handle random values as well. I'll share the work I've done thus far. I'm doing this right now, using the straightforward recovery model:
BEGIN TRAN declare @x int = 1 while @x <= 5000 begin INSERT INTO dimSpeed Select T1.id as T1ID, T2.DateValue as T2ID, T3.TIME_ID as T3ID, ABS(Checksum(NewID()) % 70) + 20 From lines T1, dimDate T2, dimTime T3 WHERE T1.id = @x AND T2.DateValue > '1/1/2015' AND T2.DateValue < '1/1/2016' if (@x % 100) = 0 begin COMMIT TRAN BEGIN TRAN end set @x = @x + 1 end COMMIT TRAN
is the number of entries I am adding into TABLE1 (t1). It takes about 5 minutes to complete 5000. To input all the data I need at this pace will take 70 days. Undoubtedly, a faster alternative is required.