I'm trying to achieve database abstraction in my project, but now I got stuck with doing a bulk INSERT in PostgreSQL. My project is in C# and I'm using PostgreSQL 9.3 with npgsql.dll 2.0.14.
For Microsoft SQL Server I'm doing the bulk INSERT simply by concatenating all statements and then performing an ExecuteNonQuery:
IF NOT EXISTS (SELECT id FROM table WHERE id = 1) INSERT INTO table (id) VALUES (1);
IF NOT EXISTS (SELECT id FROM table WHERE id = 2) INSERT INTO table (id) VALUES (2);
IF NOT EXISTS (SELECT id FROM table WHERE id = 3) INSERT INTO table (id) VALUES (3);
Though the IF-NOT-EXISTS clause can be substituted in PostgreSQL by a SELECT-WHERE, this approach unfortunately still doesn't work - because every single statement in PostgreSQL is committed separately.
So I googled for another solution and found the approach of using the COPY
command along with NpgsqlCopySerializer/NpgsqlCopyIn to performantly "stream" the bulk data. But now I'm getting primary key violation errors all the time - 'cause the EXISTS/WHERE clause can seemingly not be used together with the COPY
statement.
I would really like to avoid to do the INSERTs all one-by-one, as this will slow down my application extremely, so I hope that anyone solved this issue already!
Generally for this type of situation I'd have a separate staging table that does not have the PK constraint, which I'd populate using COPY
(assuming the data were in a format for which it makes sense to do a COPY
). Then I'd do something like:
insert into table
select a.*
from staging a
where not exists (select 1
from table
where a.id = b.id)
That approach isn't too far off from your original design.
I don't totally understand this part of your question, however, which doesn't even seem totally relevant to your question:
this approach unfortunately still doesn't work - because every single statement in postgreSQL is committed separately.
That's not true at all, not for any RDBMS. Sure, auto-commit might be enabled on your client, but that doesn't mean that postgres commits every statement separately and that you can't disable the auto-commit. This approach would work:
begin;
insert into table (id) select 1 where not exists (select 1 from table where id = 1);
insert into table (id) select 2 where not exists (select 1 from table where id = 2);
insert into table (id) select 3 where not exists (select 1 from table where id = 3);
commit;
As you pointed out, however, if you've got more than a handful of such statements you'll quickly be hitting some performance concerns.