Does inter-namespace communication in Kubernetes add latency?

502 Views Asked by At

I see a weird issue in my GKE cluster, where inter-namespace API calls take longer.

  • Say I have two namespaces ns-1 and ns-2.
  • I have an API deployed in ns-1, let's call api-service
  • if I call the api-service, from any other app/pod deployed in ns-1, it takes around 5ms.
  • if I call the api-service from an app/pod deployed in ns-2, it takes ~40ms. To call from another namespace I am using FQDN - api-service.ns-1.svc.cluster.local
  • and when I am trying to call the api-service, from ns-1also, using the FQDN -api-service.ns-1.svc.cluster.local`, it is again taking ~25-40ms.

prima-facie it looks like inter-namespace communication, or when you are using FQDN adds latency.
I tried to check the documentation for the same, but couldn't find anything mentioning the same.
Usually this shouldn't be happening but I am clueless as of now.

Any help regarding the same would be appreciated.

Note:

  • I am making the same API call from both the pods
  • Response time from different pods:
//calling from the same namespace (ns-1)
# curl -X POST  "http://api-service:8080/hello" -w %{time_connect}:%{time_starttransfer}:%{time_total}
0.004539:0.007073:0.007096

//calling from the same namespace (ns-1)
# curl -X POST  "http://api-service.ns1.svc.cluster.local:8080/hello" -w %{time_connect}:%{time_starttransfer}:%{time_total}
0.028735:0.030097:0.030158

//calling from a different namespace (ns-2)
# curl -X POST  "http://api-service.ns1.svc.cluster.local:8080/hello" -w %{time_connect}:%{time_starttransfer}:%{time_total}
0.125594:0.163450:0.163519
0.028722:0.030159:0.030221

or if anyone can point me to the correct documentation for the same.

1

There are 1 best solutions below

0
Veera Nagireddy On

As per your question it seems there is a latency issue due to inter-namespace communication, or when you are using FQDN adds latency.

Can you try the below process, which may help to reduce the latency :

1. How to reduce Inter-namespace communication Latency:

The DNSCache from k8s and the one from GCP may have an increase of 2.5x which means that it's definitely better if you want to reduce the latency.

Try below different configurations:

a. Enable Node Local DNSCache as noted in the official k8s documentation.

b. Addon DNSCache as mentioned in GCP official documentation.

c. Without using any DNSCache, only the native kube-dns

2. How to reduce FQDN Latency:

If you are able to configure your application to make DNS queries to the hostname with a trailing dot (for example, google.com.). The reason for adding a trailing dot is right now the ndots should be the default which is set to 5. This would mean any hostname with less than 5 dots would not be treated as an FQDN and results in generating more DNS queries (roughly 6 per DNS query with an FQDN containing less than 5 dots as it will append the search domains ..svc.cluster.local, .svc.cluster.local, .cluster.local, .c..internal, and .google.internal. before attempting the hostname by itself). If you weren’t able to configure the application to make the changes the other option we could try is by implementing a specific DNS configuration on a per-deployment basis:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ubuntu
spec:
replicas: 3
template:
metadata:
labels:
app: ubuntu
spec:
containers:
- name: ubuntu-sleep-cnt
image: ubuntu
command:
- /bin/sh
- -c
- echo hello && sleep 6000000
dnsConfig:
options:
- name: ndots
value: "1"

Above will create a resolve.conf with the ndot value set to 1. So any hostname with at least a single dot will be treated as a FQDN. This should alleviate some of the strain on KubeDNS and Node Local DNSCache.