I have recently setup a automatic Failover configuration using #repmgr in PostgresSQL-15 using 3 nodes (1 primary and 2 secondaries) with repmgrd i'm able to achieve #automaticFailover when the primary goes down the standby is electing as primary however when the old primary is online i'm experiencing split-brain type issue and unable to recover from it would like to know is there any way to rejoin the old primary as standby and how do I achieve it. FYI I dont have the WAL_LOG_Hints enabled in config.
below is the status of cluster before and after:
NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+--------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------------------------
1 | Master | primary | * running | | default | 100 | 6 | user=repmgr password=password host=ip1 dbname=repmgr port=5432 connect_timeout=2
2 | slaveA | standby | running | Master | default | 100 | 6 | user=repmgr password=password host=ip2 dbname=repmgr port=5432 connect_timeout=2
3 | slaveB | standby | running | Master | default | 100 | 6 | user=repmgr password=password host=ip3 dbname=repmgr port=5432 connect_timeout=2
---after
NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
ERROR: connection to database failed
DETAIL:
connection to server at "ip1", port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
DETAIL: attempted to connect using:
user=repmgr password=password connect_timeout=2 dbname=repmgr host=ip1 port=5432 fallback_application_name=repmgr options=-csearch_path=
WARNING: following issues were detected
- when attempting to connect to node "Master" (ID: 1), following error encountered :
"connection to server at "ip1", port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?"
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+--------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------
--------
1 | Master | primary | - failed | ? | default | 100 | | user=repmgr password= host=ip1 dbname=repmgr port=5432 connect_ti
meout=2
2 | slaveA | primary | * running | | default | 100 | 7 | user=repmgr password=host=ip2 dbname=repmgr port=5432 connect_ti
meout=2
3 | slaveB | standby | running | slaveA | default | 100 | 7 | user=repmgr password=host=ip3 dbname=repmgr port=5432 connect_ti
meout=2
** after promoting standby as primary I was able to make standby role to primary then after stopping the secondary (current primary) the old primary role status changed to primary as normal but when I started the secondary again I got into this situation as below
NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
ERROR: connection to database failed
DETAIL:
connection to server at "ip3", port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
DETAIL: attempted to connect using:
user=repmgr password=password connect_timeout=2 dbname=repmgr host=ip3 port=5432 fallback_application_name=repmgr options=-csearch_path=
WARNING: following issues were detected
- node "slaveA" (ID: 2) is registered as standby but running as primary
- when attempting to connect to node "slaveB" (ID: 3), following error encountered :
"connection to server at "ip3", port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?"
- node "slaveB" (ID: 3) is registered as an active standby but is unreachable
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+--------+---------+----------------------+----------+----------+----------+----------+-------------------------------------------------------------------------
-------------------
1 | Master | primary | * running | | default | 100 | 6 | user=repmgr password=password host=ip1 dbname=repmgr port=5432
connect_timeout=2
2 | slaveA | standby | ! running as primary | | default | 100 | 7 | user=repmgr password=password host=ip2 dbname=repmgr port=5432
connect_timeout=2
3 | slaveB | standby | ? unreachable | ? Master | default | 100 | | user=repmgr password=password host=ip3 dbname=repmgr port=5432
connect_timeout=2
please let me know if need any details
#repmgr #postgresql-15 #highAvailability #repmgrd #issuewithrepmgr
Auto failover using repmgr in postgresql got issues need help with options to rejoin node or attach as standby #repmgr #postgresql-15 #highAvailability #repmgrd #issuewithrepmgr