Friday, September 13, 2013

SQL: Upsert in PostgreSQL

Upsert is a very useful SQL statement. Unfortunately not every database supports it. The product I am working on can use several different databases. One of them is PostgreSQL which does not support upserts.

It is not a big deal: googling for it turns several solutions, including triggers, functions, and "Writeable CTE". I find the Writeable CTE solution quite elegant. Unfortunately it has one ... well feature that sometimes might be completely unimportant, just a nuisance in some cases, or a real problem in other situations.

For me it was the latter.

If you execute the example from the "Writeable CTE" page you will get the following results before executing the upsert:
1   12   CURR
2   13   NEW
3   15   CURR
After upsert the results are (changes are in bold):
1   12   CURR
2   37   CURR
3   15   OBS
4   42   NEW
So rows with ids 2 and 3 were updated and a new row with id 4 was inserted. But if you paid a close attention to messages in pgAdmin or psql, you might have noticed the following:
Query returned successfully: 1 row affected 

The query did its job, but reported only the inserted rows! Imagine the query results in only updates. It will report then
Query returned successfully: 0 row affected

By the way Oracle reports the correct result: combined number of affected rows. In the example above it says
3 rows merged

Is it important? After all the query did what it was asked to do.

For me it is important. My real query could do 3 things: update a single row, insert a single row, or do nothing. And I need to know which way it went. Actually all I need to know if a row is affected or not. With Oracle I know. With PostgreSQL I know only if a row was inserted. Sure I always can go to the database and ask, but this means another query, another roundtrip...

But who says my upsert query can stop at only one CTE? Meat the beauty:
WITH
upsert as
(update mytable2 m
    set sales = m.sales + d.sales,
        status = d.status
   from mytable d where m.pid = d.pid
 returning m.*),
ins as
(insert into mytable2
 select a.pid, a.sales, 'NEW' from mytable a
  where a.pid not in (select b.pid from upsert b)
 returning *)
select (select count(*) from ins) inserted,
       (select count(*) from upsert) updated;

If you repeat the example, but run this query instead of the original upsert, you get the job done and you also get the following result:
inserted   updated;
1          2

You immediately know the answer. And it is better than Oracle because in Oracle you cannot differentiate between inserted and updated rows!

You can tweak the select the way you want. Need only "total number of affected rows"? Use:
select (select count(*) from ins) + (select count(*) from upsert);

I ended up with something like:
select 'i' kind from ins
union
select 'u' kind from upsert

Since there is at most one affected row in my case, I get either an empty result set, or a result set with a single row and column having value 'u' or 'i'. And I do not really need to know whether a row was inserted or updated, so my java code looks really simple:
boolean  isChanged = stmt.executeQuery().next();

Nice and simple.