Adding a Serial Column to existing table with 100,000,000 rows

341 Views Asked by At

I have a table with roughly 100,000,000 rows. We need to delete around 80,000 of them for a remediation.

In order to prevent downtime, I have a job setup to grab the records that needs to be deleted and then processes the delete in chunks of 100. However, even processing the first 100 is taking forever.

There is no primary ID on this table and the only way I can reliably reference each row is with a unique column called tx which is a varchar(250)` (though the field is never longer than 18-20 characters). I created an index on this row, but still takes roughly 4-6s to select a row.

Seemed likely the varchar was causing the problem, so I wanted to add a new id bigint serial column, but was trying to figure out whether or not doing this would lock the table until it's able to populate all of the ID's.

I know alter table add column is non blocking as long as there is no default value. But does Serial count as a default value?

I couldn't find an answer to this in the documentation. We're on Postgres 12.

1

There are 1 best solutions below

0
Laurenz Albe On

Adding a new column with a sequence-generated value will rewrite the table, which will cause down time. With some care, it could be done without down time, but that is complicated and not worth the effort if you already have a varchar column with a unique index on it that does not contain NULL values.

Searching for rows with the existing index should be a matter of milliseconds. If it isn't, that's the problem you have to solve. Can you add EXPLAIN (ANALYZE, BUFFERS) output for the query to the question?