I have a postgres high availability cluster setup like 1 Primary, 1 Standby in same location (w.r.t repmgr config file) now i added witness node with in the same location. while im testing a test case like network disconnection between Primary and standby, but network is available between primary and witness & standby and witness. In this test case standby is getting promoted even witness sees primary. How come Standby gets winner to promote as new primary ?
Cluster info before network disconnection:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------------------+---------+-----------+-------------------+----------+----------+----------+--------------------------------------------------------------------
1 | host001 | primary| * running | | dc1 | 100 | 16 | host=host001 user=repmgr dbname=repmgr connect_timeout=2
2 | host002 | standby | running | host001 | dc1 | 100 | 16 | host=host002 user=repmgr dbname=repmgr connect_timeout=2
4 | host004 | witness | * running | host001 | dc1 | 0 | n/a | host=host004 user=repmgr dbname=repmgr connect_timeout=2
Cluster info after network disconnection:
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------------------+---------+---------------+---------------------+----------+----------+----------+--------------------------------------------------------------------
1 | host001 | primary | * running | | dc1 | 100 | 16 | host=host001 user=repmgr dbname=repmgr connect_timeout=2
2 | host002 | standby | ? unreachable | ? host001 | dc1 | 100 | | host=host002 user=repmgr dbname=repmgr connect_timeout=2
4 | host004 | witness | * running | host001 | dc1 | 0 | n/a | host=host004 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "host002" (ID: 2)
- node "host002" (ID: 2) is registered as an active standby but is unreachable
Standby repmgr log file:
[2024-02-08 22:22:38] [INFO] 1 active sibling nodes registered
[2024-02-08 22:22:38] [INFO] 3 total nodes registered
[2024-02-08 22:22:38] [INFO] primary node "host001" (ID: 1) and this node have the same location ("dc1")
[2024-02-08 22:22:38] [INFO] local nodes last receive lsn: 0/83009110
[2024-02-08 22:22:38] [INFO] checking state of sibling node "host004" (ID: 4)
[2024-02-08 22:22:38] [INFO] node "host004" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago
[2024-02-08 22:22:38] [NOTICE] witness node "host004" (ID: 4) last saw primary node 0 second(s) ago, considering primary still visible
[2024-02-08 22:22:38] [INFO] 1 nodes can see the primary
[2024-02-08 22:22:38] [DETAIL] following nodes can see the primary:
- node "host004" (ID: 4): 0 second(s) ago
[2024-02-08 22:22:38] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
[2024-02-08 22:22:38] [NOTICE] promotion candidate is "host002" (ID: 2)
[2024-02-08 22:22:38] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2024-02-08 22:22:38] [INFO] promote_command is:
"repmgr standby promote -f /u01/app/admin/Data/repmgr.conf --log-to-file --siblings-follow"
[2024-02-08 22:22:38] [NOTICE] redirecting logging output to "/u01/app/admin/Data/PG_LOGS/repmgr.log"
[2024-02-08 22:22:40] [NOTICE] promoting standby to primary
[2024-02-08 22:22:40] [DETAIL] promoting server "host002" (ID: 2) using pg_promote()
[2024-02-08 22:22:40] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2024-02-08 22:22:41] [NOTICE] STANDBY PROMOTE successful
[2024-02-08 22:22:41] [DETAIL] server "host002" (ID: 2) was successfully promoted to primary
[2024-02-08 22:22:41] [NOTICE] executing STANDBY FOLLOW on 1 of 1 siblings
INFO: node 4 received notification to follow node 2
[2024-02-08 22:22:42] [INFO] STANDBY FOLLOW successfully executed on all reachable sibling nodes
[2024-02-08 22:22:42] [INFO] checking state of node 2, 1 of 6 attempts
[2024-02-08 22:22:42] [NOTICE] node 2 has recovered, reconnecting
[2024-02-08 22:22:42] [INFO] connection to node 2 succeeded
[2024-02-08 22:22:42] [INFO] original connection is still available
[2024-02-08 22:22:42] [INFO] 1 followers to notify
[2024-02-08 22:22:42] [NOTICE] notifying node "host004" (ID: 4) to follow node 2
INFO: node 4 received notification to follow node 2
[2024-02-08 22:22:42] [INFO] switching to primary monitoring mode
[2024-02-08 22:22:42] [NOTICE] monitoring cluster primary "host002" (ID: 2)
[2024-02-08 22:22:42] [INFO] child node "host004" (ID: 4) is not yet attached
[2024-02-08 22:27:43] [INFO] monitoring primary node "host002" (ID: 2) in normal state
[2024-02-08 22:32:44] [INFO] monitoring primary node "host002" (ID: 2) in normal state