Get Or Create In PostgreSQL
One operation I find myself occasionally doing with anything involving a database is “get or create”: return an object with a particular ID or unique field, and if it doesn’t exist, create it.
Yesterday, in the project I’m working on now, I saw a need for this. Being backed by a PostgreSQL database, I guess I could’ve just opened up a transaction, ran a SELECT
and if it was empty, run an INSERT
. But I was curious to know whether it was possible to do this in a single SQL query. A transaction would work, but I rather avoid the round trip.
An example would be a table of books with titles and ISBNs. What I wanted was a query that would:
- Take as arguments a book with an ISBN and title.
- If no such book with that ISBN exists, insert it and return the new row as the result.
- Otherwise, simply return the existing book with that ISBN as the result.
I couldn’t find a way to do this with just an INSERT
statement, so I asked ChatGPT, and it suggested a query much like the following:
WITH ins AS (
INSERT INTO books (id, isbn, title)
VALUES ($1, $2, $3)
ON CONFLICT (isbn) DO NOTHING
RETURNING id, isbn, title
)
SELECT id, isbn, title FROM ins
UNION ALL
SELECT id, isbn, title FROM books WHERE isbn = $2
LIMIT 1;
Pretty ingenious. Attempt to insert the book, but do nothing if there’s a conflict. That would make the ins
subquery empty, resulting in the UNION ALL
return the second query which is simply a SELECT
. (I also learnt that UNION ALL
differs from UNION
in that it does not remove duplicates from the result sets).
I am wondering if it’s possible to indicate whether the returned row was the new row or the existing one. Previous techniques I used to do this was to check the IDs, and see whether the return ID matched the generated one I passed into the query. They too would be unique and are unlikely to result in a conflict (in the real system I’m working on, the IDs are UUIDs). Another idea would be to simply add an extra boolean
field on each of the SELECT
s, with one returning true and the other false.
I am wondering if that second SELECT
still runs if the INSERT
was successful, dropping the result on the floor thanks to the LIMIT
. It’s a little inelegant but I think I can live with it. After all, it would’ve been what I would be doing anyway I had used a separate SELECT
and INSERT
call, just without the roundtrip.