I have a table with roughly 100,000,000 rows. We need to delete around 80,000 of them for a remediation.
In order to prevent downtime, I have a job setup to grab the records that needs to be deleted and then processes the delete in chunks of 100. However, even processing the first 100 is taking forever.
There is no primary ID on this table and the only way I can reliably reference each row is with a unique column called tx which is a varchar(250)` (though the field is never longer than 18-20 characters). I created an index on this row, but still takes roughly 4-6s to select a row.
Seemed likely the varchar was causing the problem, so I wanted to add a new id bigint serial column, but was trying to figure out whether or not doing this would lock the table until it's able to populate all of the ID's.
I know alter table add column is non blocking as long as there is no default value. But does Serial count as a default value?
I couldn't find an answer to this in the documentation. We're on Postgres 12.
Adding a new column with a sequence-generated value will rewrite the table, which will cause down time. With some care, it could be done without down time, but that is complicated and not worth the effort if you already have a
varcharcolumn with a unique index on it that does not contain NULL values.Searching for rows with the existing index should be a matter of milliseconds. If it isn't, that's the problem you have to solve. Can you add
EXPLAIN (ANALYZE, BUFFERS)output for the query to the question?