Update duplicate values to the next free number

56 Views Asked by At

I have a table like below:

CREATE TABLE "table" (
    "index" serial PRIMARY KEY,
    "number" integer
);

With these rows:

INSERT INTO "table" ("number") VALUES
(1), (2), (2), (3), (4), (4), (4), (5), (13), (13), (17);

I want to update duplicate values to the next free integer number (except for the 1st one).
For this case it should be like:

(1), (2), (3), (4), (5), (6), (7), (8), (13), (14), (17)

How could the UPDATE work?

1

There are 1 best solutions below

2
Erwin Brandstetter On BEST ANSWER

Every next free number potentially depends on all previous updates in the process. So this needs, at its heart, a procedural solution.

The best solution depends on cardinalities and the frequency of duplicates and gaps. According to your sample I assume:

  • few dupes
  • moderately more gaps

The below code works in any case, but best for the stated assumptions.

DO
$do$
DECLARE
   _id int;
   _number int;
BEGIN
   CREATE TEMP TABLE free ON COMMIT DROP AS
   SELECT number
   FROM  (SELECT generate_series(min(number), max(number) + 10) FROM tbl) n(number)
   LEFT  JOIN tbl t USING (number)
   WHERE t.number IS NULL;

   -- (only) if table is big, add an index
   CREATE INDEX ON pg_temp.free (number);
   
   FOR _id, _number IN
      SELECT id, number
      FROM (
         SELECT *, lag(number) OVER (ORDER BY number) AS last_num
         FROM   tbl
         ) dup
      WHERE dup.last_num = dup.number
   LOOP
      WITH del AS (
         DELETE FROM pg_temp.free f
         USING (
            SELECT f1.number
            FROM   pg_temp.free f1
            WHERE  f1.number > _number
            ORDER  BY f1.number 
            LIMIT  1
            ) d
         WHERE f.number = d.number
         RETURNING f.number
         )
      UPDATE tbl t
      SET    number = d.number
      FROM   del d
      WHERE  t.id = _id;
   END LOOP;
END
$do$;

fiddle

This PL/pgSQL code block first creates a temporary table of free numbers (free) in the range of the given table tbl. I (arbitrarily) add 10 more numbers after the highest one. If you might need more than 10 additional numbers past the highest one, you need to do more.

If that table comes out big, create an index.

Then run through all duplicates, and assign the next free number, consuming it.

Obviously, this algorithm assumes no concurrent writes.