Postgres PKey constraint violation - Old tuples are not getting Frozen

105 Views Asked by At

Pg 9.6. autovacuum on. Original problem was : we can't insert value because of pkey violation, but select doesn't return rows with same value.

posstorage=# \d system_parameter
     Table "posstorage.system_parameter"
   Column    |       Type        | Modifiers 
-------------+-------------------+-----------
 name        | character varying | not null
 value       | character varying | 
 description | character varying | 
Indexes:
    "system_parameter_pkey" PRIMARY KEY, btree (name) CLUSTER
    "system_parameter$name$idx" btree (name)


posstorage=# select * from system_parameter;
           name           |       value       |                                    description                                     

--------------------------+-------------------+------------------------------------------------------------------------------------
 BO_ACCOUNT_ID            | 1                 | Рдентификатор пользователя BO
 STATUSHISTORY_QUERY_DAYS | 33                | Максимальный интервал просмотра исторических данных в таблице statushistory, дней.
 1DEFAULT_OFD             | 1ATOL_v4_stricted | 1Значение ОФД по умолчанию, если значение не задано в поле OFD таблицы ENTITY


posstorage=# insert into system_parameter values ('DEFAULT_OFD','xxx','xxx');                  
ERROR:  duplicate key value violates unique constraint "system_parameter_pkey"
DETAIL:  Key (name)=(DEFAULT_OFD) already exists.

After some research as we understand it wright - found that we have another row with this value that was commited , but is not visible and not getting Freeze after manual vaccuum , vacuum freeze or cluster.

posstorage=# select convert_to('DEFAULT_OFD', 'UTF8');  --  \x44454641554c545f4f4644

posstorage=# SELECT format('(0,%s)', lp) AS tid,
       t_ctid AS replacement,
       t_xmax <> 0 AS dead,
       t_infomask2 & 16384 <> 0 AS hot_updated,
       t_attrs[1] AS total
FROM heap_page_item_attrs(get_raw_page('system_parameter', 0), 'system_parameter') where t_attrs[1]='\x1944454641554c545f4f4644';                  
  tid   | replacement | dead | hot_updated |           total            
--------+-------------+------+-------------+----------------------------
 (0,15) | (0,15)      | t    | f           | \x1944454641554c545f4f4644
 (0,16) | (0,16)      | t    | f           | \x1944454641554c545f4f4644
 (0,17) | (0,17)      | t    | f           | \x1944454641554c545f4f4644
 (0,18) | (0,18)      | t    | f           | \x1944454641554c545f4f4644
 (0,19) | (0,19)      | t    | f           | \x1944454641554c545f4f4644
 (0,21) | (0,21)      | t    | f           | \x1944454641554c545f4f4644
 (0,22) | (0,22)      | f    | f           | \x1944454641554c545f4f4644     <-- this row is not dead but not visible now and seems to issue constraint violation
(7 rows)

Lets look deeper:

    posstorage=# SELECT * FROM heap_page_items(get_raw_page('system_parameter',0)) where t_ctid='(0,22)' ;
    -[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    lp          | 22
    lp_off      | 4152
    lp_flags    | 1
    lp_len      | 190
    t_xmin      | 226114219
    t_xmax      | 0
    t_field3    | 59
    t_ctid      | (0,22)
    t_infomask2 | 3
    t_infomask  | 2306
    t_hoff      | 24
    t_bits      | 
    t_oid       | 
    t_data      | \x1944454641554c545f4f46442341544f4c5f76345f737472696374656400000018020000d097d0bdd0b0d187d0b5d0bdd0b8d0b520d09ed0a4d09420d0bfd0be20d183d0bcd0bed0bbd187d0b0d0bdd0b8d18e2c20d0b5d181d0bbd0b820d0b7d0bdd0b0d187d0b5d0bdd0b8d0b520d0bdd0b520d0b7d0b0d0b4d0b0d0bdd0be20d0b220d0bfd0bed0bbd0b5204f464420d182d0b0d0b1d0bbd0b8d186d18b20454e54495459

As we can see row was inserted at xmin = 226114219 and was not removed (t_xmax=0) and current txid is lower (94831127)


posstorage=# SELECT txid_current();
    -[ RECORD 1 ]+---------
    txid_current | 

posstorage=# SELECT '(0,'||lp||')' AS ctid,
           CASE lp_flags
             WHEN 0 THEN 'unused'
             WHEN 1 THEN 'normal'
             WHEN 2 THEN 'redirect to '||lp_off
             WHEN 3 THEN 'dead'
           END AS state,
           t_xmin as xmin,
           t_xmax as xmax,
           (t_infomask & 256) > 0  AS xmin_commited,
           (t_infomask & 512) > 0  AS xmin_aborted,
           (t_infomask & 1024) > 0 AS xmax_commited,
           (t_infomask & 2048) > 0 AS xmax_aborted,
           t_ctid
            FROM heap_page_items(get_raw_page('system_parameter',0)) where t_ctid='(0,22)';
    -[ RECORD 1 ]-+----------
    ctid          | (0,22)
    state         | normal
    xmin          | 226114219
    xmax          | 0
    xmin_commited | t
    xmin_aborted  | f
    xmax_commited | f
    xmax_aborted  | t
    t_ctid        | (0,22)
    
posstorage=# SELECT * FROM heap_page('system_parameter',0) where t_ctid='(0,22)';
                  ctid  | state  |     xmin      | xmax  | t_ctid 
                --------+--------+---------------+-------+--------
                 (0,22) | normal | 226114219 (c) | 0 (a) | (0,22)
                (1 row)

As we can see rows are not getting frozen after manual vacuum full,vacuum freeze or cluster

posstorage=# vacuum FREEZE verbose system_parameter ;                                          
    INFO:  vacuuming "posstorage.system_parameter"
    INFO:  index "system_parameter_pkey" now contains 34 row versions in 2 pages
    DETAIL:  0 index row versions were removed.
    0 index pages have been deleted, 0 are currently reusable.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  index "system_parameter$name$idx" now contains 34 row versions in 2 pages
    DETAIL:  0 index row versions were removed.
    0 index pages have been deleted, 0 are currently reusable.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  "system_parameter": found 0 removable, 34 nonremovable row versions in 1 out of 1 pages
    DETAIL:  14 dead row versions cannot be removed yet.
    There were 2 unused item pointers.
    Skipped 0 pages due to buffer pins.
    0 pages are entirely empty.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  vacuuming "pg_toast.pg_toast_34423"
    INFO:  index "pg_toast_34423_index" now contains 0 row versions in 1 pages
    DETAIL:  0 index row versions were removed.
    0 index pages have been deleted, 0 are currently reusable.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    INFO:  "pg_toast_34423": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
    DETAIL:  0 dead row versions cannot be removed yet.
    There were 0 unused item pointers.
    Skipped 0 pages due to buffer pins.
    0 pages are entirely empty.
    CPU 0.00s/0.00u sec elapsed 0.00 sec.
    VACUUM
posstorage=# SELECT * FROM heap_page('system_parameter',0,0);
              ctid  | state  |     xmin      |  xmin_age  |     xmax      | t_ctid 
            --------+--------+---------------+------------+---------------+--------
             (0,1)  | normal | 61312329 (f)  |   33518801 | 0 (a)         | (0,1)
             (0,2)  | normal | 94840819 (c)  |      -9689 | 0 (a)         | (0,2)
             (0,3)  | normal | 94840820 (c)  |      -9690 | 0 (a)         | (0,3)
             (0,4)  | normal | 94840821 (c)  |      -9691 | 0 (a)         | (0,4)
             (0,5)  | normal | 94840822 (c)  |      -9692 | 0 (a)         | (0,5)
             (0,6)  | normal | 94840823 (c)  |      -9693 | 0 (a)         | (0,6)
             (0,7)  | normal | 94840824 (c)  |      -9694 | 0 (a)         | (0,7)
             (0,8)  | normal | 207785422 (c) | -112954292 | 0 (a)         | (0,8)
             (0,9)  | normal | 61312329 (f)  |   33518801 | 207785422 (c) | (0,8)
             (0,10) | normal | 94840826 (c)  |      -9696 | 0 (a)         | (0,10)
             (0,11) | normal | 94840827 (c)  |      -9697 | 0 (a)         | (0,11)
             (0,12) | normal | 94840828 (c)  |      -9698 | 0 (a)         | (0,12)
             (0,13) | normal | 94840829 (c)  |      -9699 | 0 (a)         | (0,13)
             (0,14) | normal | 94840997 (c)  |      -9867 | 0 (a)         | (0,14)
             (0,15) | normal | 226114167 (c) | -131283037 | 226114168 (c) | (0,15)
             (0,16) | normal | 226114169 (c) | -131283039 | 226114207 (c) | (0,16)
             (0,17) | normal | 226114208 (c) | -131283078 | 226114209 (c) | (0,17)
             (0,18) | normal | 226114210 (c) | -131283080 | 226114214 (c) | (0,18)
             (0,19) | normal | 226114215 (c) | -131283085 | 226114217 (c) | (0,19)
             (0,20) | normal | 142584526 (c) |  -47753396 | 0 (a)         | (0,20)
             (0,21) | normal | 195887902 (c) | -101056772 | 226114166 (c) | (0,21)
             (0,22) | normal | 226114219 (c) | -131283089 | 0 (a)         | (0,22)   <- still not frozen
             (0,23) | normal | 241684907 (c) | -146853777 | 0 (a)         | (0,23)
             (0,24) | normal | 195888393 (c) | -101057263 | 241684907 (c) | (0,23)
             (0,25) | normal | 142584889 (c) |  -47753759 | 195888393 (c) | (0,24)
             (0,26) | normal | 241823539 (c) | -146992409 | 241824890 (c) | (0,26)
             (0,27) | normal | 241843019 (c) | -147011889 | 241922453 (c) | (0,27)
             (0,28) | normal | 241924322 (c) | -147093192 | 0 (a)         | (0,28)
             (0,29) | normal | 242116441 (c) | -147285311 | 0 (a)         | (0,29)
             (0,30) | normal | 241684906 (c) | -146853776 | 242116441 (c) | (0,29)
             (0,31) | normal | 195888394 (c) | -101057264 | 241684906 (c) | (0,30)
             (0,32) | normal | 142584888 (c) |  -47753758 | 195888394 (c) | (0,31)
             (0,33) | normal | 94829330 (f)  |       1800 | 0 (a)         | (0,33)
             (0,34) | normal | 94829545 (f)  |       1585 | 0 (a)         | (0,34)

We checked pg_stat_activity(has no long running tx),no replication,feedback or prepared_tx, also restarted cluster.

ps: research are from https://habr.com/en/company/postgrespro/blog/487590/ and https://habr.com/en/company/postgrespro/blog/477648/

Tried manual vacuum full,vacuum freeze or cluster, want understand why old rows are not getting frozen and are not visible

1

There are 1 best solutions below

0
Laurenz Albe On

That looks like you have experienced transaction ID wraparound, a form of data corruption. In PostgreSQL v14 or better, you could use the function heap_force_freeze from pg_surgery to mark the tuple as frozen and make it visible.

In an old, unsupported version like 9.6, the best you can do is dump and restore to a recent PostgreSQL version and abandon the “vanished” tuples.