What is use case where we can see the benefit out of having witness node in the PG cluster

23 Views Asked by At

I have a postgres high availability cluster setup like 1 Primary, 1 Standby in same location (w.r.t repmgr config file) now i added witness node with in the same location. while im testing a test case like network disconnection between Primary and standby, but network is available between primary and witness & standby and witness. In this test case standby is getting promoted even witness sees primary. How come Standby gets winner to promote as new primary ?

Cluster info before network disconnection:

ID | Name              | Role    | Status    | Upstream          | Location | Priority | Timeline | Connection string                                     
----+-------------------+---------+-----------+-------------------+----------+----------+----------+--------------------------------------------------------------------
 1  | host001           | primary| * running |                   | dc1      | 100      | 16       | host=host001 user=repmgr dbname=repmgr connect_timeout=2
 2  | host002           | standby |   running | host001           | dc1      | 100      | 16       | host=host002 user=repmgr dbname=repmgr connect_timeout=2
 4  | host004           | witness | * running | host001           | dc1      | 0        | n/a      | host=host004 user=repmgr dbname=repmgr connect_timeout=2

Cluster info after network disconnection:

 ID | Name              | Role    | Status        | Upstream            | Location | Priority | Timeline | Connection string                               
----+-------------------+---------+---------------+---------------------+----------+----------+----------+--------------------------------------------------------------------
 1  | host001           | primary | * running     |                     | dc1      | 100      | 16       | host=host001 user=repmgr dbname=repmgr connect_timeout=2
 2  | host002           | standby | ? unreachable | ? host001           | dc1      | 100      |          | host=host002 user=repmgr dbname=repmgr connect_timeout=2
 4  | host004           | witness | * running     | host001             | dc1      | 0        | n/a      | host=host004 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "host002" (ID: 2)
  - node "host002" (ID: 2) is registered as an active standby but is unreachable

Standby repmgr log file:

[2024-02-08 22:22:38] [INFO] 1 active sibling nodes registered
[2024-02-08 22:22:38] [INFO] 3 total nodes registered
[2024-02-08 22:22:38] [INFO] primary node  "host001" (ID: 1) and this node have the same location ("dc1")
[2024-02-08 22:22:38] [INFO] local nodes last receive lsn: 0/83009110
[2024-02-08 22:22:38] [INFO] checking state of sibling node "host004" (ID: 4)
[2024-02-08 22:22:38] [INFO] node "host004" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago
[2024-02-08 22:22:38] [NOTICE] witness node "host004" (ID: 4) last saw primary node 0 second(s) ago, considering primary still visible
[2024-02-08 22:22:38] [INFO] 1 nodes can see the primary
[2024-02-08 22:22:38] [DETAIL] following nodes can see the primary:
 - node "host004" (ID: 4): 0 second(s) ago

[2024-02-08 22:22:38] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
[2024-02-08 22:22:38] [NOTICE] promotion candidate is "host002" (ID: 2)
[2024-02-08 22:22:38] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2024-02-08 22:22:38] [INFO] promote_command is:
  "repmgr standby promote -f /u01/app/admin/Data/repmgr.conf --log-to-file --siblings-follow"
[2024-02-08 22:22:38] [NOTICE] redirecting logging output to "/u01/app/admin/Data/PG_LOGS/repmgr.log"

[2024-02-08 22:22:40] [NOTICE] promoting standby to primary
[2024-02-08 22:22:40] [DETAIL] promoting server "host002" (ID: 2) using pg_promote()
[2024-02-08 22:22:40] [NOTICE] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2024-02-08 22:22:41] [NOTICE] STANDBY PROMOTE successful
[2024-02-08 22:22:41] [DETAIL] server "host002" (ID: 2) was successfully promoted to primary
[2024-02-08 22:22:41] [NOTICE] executing STANDBY FOLLOW on 1 of 1 siblings
INFO:  node 4 received notification to follow node 2
[2024-02-08 22:22:42] [INFO] STANDBY FOLLOW successfully executed on all reachable sibling nodes
[2024-02-08 22:22:42] [INFO] checking state of node 2, 1 of 6 attempts
[2024-02-08 22:22:42] [NOTICE] node 2 has recovered, reconnecting
[2024-02-08 22:22:42] [INFO] connection to node 2 succeeded
[2024-02-08 22:22:42] [INFO] original connection is still available
[2024-02-08 22:22:42] [INFO] 1 followers to notify
[2024-02-08 22:22:42] [NOTICE] notifying node "host004" (ID: 4) to follow node 2
INFO:  node 4 received notification to follow node 2
[2024-02-08 22:22:42] [INFO] switching to primary monitoring mode
[2024-02-08 22:22:42] [NOTICE] monitoring cluster primary "host002" (ID: 2)
[2024-02-08 22:22:42] [INFO] child node "host004" (ID: 4) is not yet attached
[2024-02-08 22:27:43] [INFO] monitoring primary node "host002" (ID: 2) in normal state
[2024-02-08 22:32:44] [INFO] monitoring primary node "host002" (ID: 2) in normal state
0

There are 0 best solutions below