How to join a failed node(primary) as standby in #repmgr automatic Failover #Postgresql-15

102 Views Asked by At

I have recently setup a automatic Failover configuration using #repmgr in PostgresSQL-15 using 3 nodes (1 primary and 2 secondaries) with repmgrd i'm able to achieve #automaticFailover when the primary goes down the standby is electing as primary however when the old primary is online i'm experiencing split-brain type issue and unable to recover from it would like to know is there any way to rejoin the old primary as standby and how do I achieve it. FYI I dont have the WAL_LOG_Hints enabled in config.

below is the status of cluster before and after:

NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
 ID | Name   | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                  
----+--------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------------------------
 1  | Master | primary | * running |          | default  | 100      | 6        | user=repmgr password=password  host=ip1 dbname=repmgr port=5432 connect_timeout=2
 2  | slaveA | standby |   running | Master   | default  | 100      | 6        | user=repmgr password=password  host=ip2 dbname=repmgr port=5432 connect_timeout=2
 3  | slaveB | standby |   running | Master   | default  | 100      | 6        | user=repmgr password=password  host=ip3 dbname=repmgr port=5432 connect_timeout=2

---after

NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
ERROR: connection to database failed
DETAIL:
connection to server at "ip1", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?

DETAIL: attempted to connect using:
  user=repmgr password=password connect_timeout=2 dbname=repmgr host=ip1 port=5432 fallback_application_name=repmgr options=-csearch_path=
WARNING: following issues were detected
  - when attempting to connect to node "Master" (ID: 1), following error encountered :
"connection to server at "ip1", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?"

 ID | Name   | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+--------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------
--------
 1  | Master | primary | - failed  | ?        | default  | 100      |          | user=repmgr password=  host=ip1 dbname=repmgr port=5432 connect_ti
meout=2
 2  | slaveA | primary | * running |          | default  | 100      | 7        | user=repmgr password=host=ip2 dbname=repmgr port=5432 connect_ti
meout=2
 3  | slaveB | standby |   running | slaveA   | default  | 100      | 7        | user=repmgr password=host=ip3 dbname=repmgr port=5432 connect_ti
meout=2

** after promoting standby as primary I was able to make standby role to primary then after stopping the secondary (current primary) the old primary role status changed to primary as normal but when I started the secondary again I got into this situation as below

NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
ERROR: connection to database failed
DETAIL:
connection to server at "ip3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?

DETAIL: attempted to connect using:
  user=repmgr password=password connect_timeout=2 dbname=repmgr host=ip3 port=5432 fallback_application_name=repmgr options=-csearch_path=
WARNING: following issues were detected
  - node "slaveA" (ID: 2) is registered as standby but running as primary
  - when attempting to connect to node "slaveB" (ID: 3), following error encountered :
"connection to server at "ip3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?"
  - node "slaveB" (ID: 3) is registered as an active standby but is unreachable

 ID | Name   | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string

----+--------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------
-------------------
 1  | Master | primary | * running            |          | default  | 100      | 6        | user=repmgr password=password host=ip1 dbname=repmgr port=5432
 connect_timeout=2
 2  | slaveA | standby | ! running as primary |          | default  | 100      | 7        | user=repmgr password=password host=ip2 dbname=repmgr port=5432
 connect_timeout=2
 3  | slaveB | standby | ? unreachable        | ? Master | default  | 100      |          | user=repmgr password=password host=ip3 dbname=repmgr port=5432
 connect_timeout=2

please let me know if need any details

#repmgr #postgresql-15 #highAvailability #repmgrd #issuewithrepmgr

Auto failover using repmgr in postgresql got issues need help with options to rejoin node or attach as standby #repmgr #postgresql-15 #highAvailability #repmgrd #issuewithrepmgr

0

There are 0 best solutions below