Trying to make percona operator work:
https://github.com/percona/percona-xtradb-cluster-operator/blob/main/cmd/peer-list/main.go#L50
The golang code do a simple SRV lookup to get all pod IPs behind a service:
_, srvRecords, err := net.LookupSRV("", "", svcName)
This give error:
2021/11/12 19:55:22 lookup XXX.XXX.svc.cluster.local on XXX:53: dial tcp XXXX: i/o timeout
But:
- the pod can access to the DNS server (curl is OK)
- I set DNS timeout to 10s via container dnsConfig
- resolution is working well
dig srv XXXXX.XXXXX.svc.cluster.localis OK
The issue was network!
Our infra guys have disallowed TCP traffic to the kube DNS server, they was thinking DNS need only UDP :/ which is wrong !
Kafka clustering also need DNS over TCP.