How to do PostgreSQL Bulk INSERT without Primary Key Violation

npgsql postgresql sqlbulkcopy

Question

I'm trying to achieve database abstraction in my project, but now I got stuck with doing a bulk INSERT in PostgreSQL. My project is in C# and I'm using PostgreSQL 9.3 with npgsql.dll 2.0.14.

For Microsoft SQL Server I'm doing the bulk INSERT simply by concatenating all statements and then performing an ExecuteNonQuery:

IF NOT EXISTS (SELECT id FROM table WHERE id = 1) INSERT INTO table (id) VALUES (1);
IF NOT EXISTS (SELECT id FROM table WHERE id = 2) INSERT INTO table (id) VALUES (2);
IF NOT EXISTS (SELECT id FROM table WHERE id = 3) INSERT INTO table (id) VALUES (3);

Though the IF-NOT-EXISTS clause can be substituted in PostgreSQL by a SELECT-WHERE, this approach unfortunately still doesn't work - because every single statement in PostgreSQL is committed separately.

So I googled for another solution and found the approach of using the COPY command along with NpgsqlCopySerializer/NpgsqlCopyIn to performantly "stream" the bulk data. But now I'm getting primary key violation errors all the time - 'cause the EXISTS/WHERE clause can seemingly not be used together with the COPY statement.

I would really like to avoid to do the INSERTs all one-by-one, as this will slow down my application extremely, so I hope that anyone solved this issue already!

Accepted Answer

Generally for this type of situation I'd have a separate staging table that does not have the PK constraint, which I'd populate using COPY (assuming the data were in a format for which it makes sense to do a COPY). Then I'd do something like:

insert into table
select a.*
from staging a
where not exists (select 1
                  from table
                  where a.id = b.id)

That approach isn't too far off from your original design.

I don't totally understand this part of your question, however, which doesn't even seem totally relevant to your question:

this approach unfortunately still doesn't work - because every single statement in postgreSQL is committed separately.

That's not true at all, not for any RDBMS. Sure, auto-commit might be enabled on your client, but that doesn't mean that postgres commits every statement separately and that you can't disable the auto-commit. This approach would work:

begin;
insert into table (id) select 1 where not exists (select 1 from table where id = 1);
insert into table (id) select 2 where not exists (select 1 from table where id = 2);
insert into table (id) select 3 where not exists (select 1 from table where id = 3);
commit;

As you pointed out, however, if you've got more than a handful of such statements you'll quickly be hitting some performance concerns.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why