With this schema:
CREATE TABLE tag (
tag_id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
tag_slug text UNIQUE NOT NULL,
tag_name text NOT NULL
);
I'm currently using Javascript to generate a bulk UPSERT command, like this (it uses knex.raw):
knex.raw(
`
INSERT INTO tag (tag_slug, tag_name)
VALUES ${tags.map(() => `(?, ?)`).join(", ")}
ON CONFLICT (tag_slug)
DO UPDATE SET tag_name = excluded.tag_name
RETURNING *
`,
tags.map((t) => [t.slug, t.text]).flat()
)
But I'd like to convert that to a stored procedure called upsert_tags that can be called like this (feel free to adjust the function signature to something equally - or more - ergonomic):
call upsert_tags(
[
('first-tag', 'First Tag'),
('second-tag', 'Second Tag')
]
);
How can I do this?
This is my best attempt so far but it's definitely not working!
CREATE PROCEDURE upsert_tags(tags array)
LANGUAGE SQL
AS $$
INSERT INTO tag (tag_slug, tag_name)
VALUES (unnest(tags))
ON CONFLICT (tag_slug)
DO UPDATE SET tag_name = excluded.tag_name
RETURNING *
$$
The return value only needs to include the tag_id because I am using it to enter data into a through table to record a many:many relationship.
FUNCTIONvs.PROCEDURERETURNING *indicates You want to return a set of rows - aSETOF tagto be precise. The manual:Output parameters won't cut it. A
PROCEDUREis the wrong choice to begin with. Use aFUNCTIONinstead. See:Implementation
There are many possible ways how to format input data.
There are many possible ways how to treat conflicts exactly.
There are many possible ways how to return data.
We can make almost anything work. We can even make much of it dynamic and/or generic to work with varying tables / columns. (Think of possible future changes to the table ...) See:
The best solution depends on what you need exactly. Demonstrating two implementations.
Implementation 1: Passing two separate Postgres arrays
For just two columns it may be convenient to pass two separate arrays. Postgres has a dedicated variant of
unnest()to unnest arrays in parallel. See:So:
Call:
Returning
SETOF public.tagreturns complete resulting rows like your original. It introduces a dependency on the row type of the table, which has pros and cons ...About passing arrays to a function:
Implementation 2: Passing a JSON array of objects
You mentioned Javascript, so it may be convenient to pass a JSON array of objects, which we then decompose with
json_populate_recordset(). (There are other options likejson_to_recordset()...)Call:
fiddle
Concurrency? Performance?
If the function may be called from multiple transactions concurrently (or concurrent, competing writes on the same table in any way), there are intricate race conditions.
Either of these solutions (including your original) overwrites conflicting rows, even if
tag_namedoes not change, which adds cost doing nothing useful. This matters if it happens a lot. We can skip that, but you still may want to get a complete set of output rows matching the input?For either of these issues see:
If you also want to know whether each row was inserted or updated, see:
Final implementation
Building on your added solution in the comments. For your particular case, this implementation makes sense.
fiddle
I added
tag_slugto result rows to allow linking back.All of this assumes there are no duplicates within your input. Else you need to do more. Related:
And since there are no duplicates in the input, a plain
JOINperforms better thanINin the finalSELECT.INwould also try to remove duplicates on the right side ...Finally, about that subsequent write to a through table: You might integrate that into the same function to optimize performance. Related: