Python requests library not resolving non-authoritative dns lookups

1.2k Views Asked by At

I have a python gunicorn web application that throws the following error when I try to resolve an internal dns name using coredns caching:

raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='lb.consul.local', port=80): 
Max retries exceeded with url: /hello/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f414d5259b0>: 
Failed to establish a new connection: [Errno -2] Name or service not known',))"

I am able to resolve the same using dig:

dig @172.1.0.54 lb.consul.local

; <<>> DiG 9.9.5-9+deb8u16-Debian <<>> lb.consul.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58411
;; flags: qr rd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;lb.consul.local. IN A

;; ANSWER SECTION:
lb.consul.local. 1 IN A 172.10.9.0

;; Query time: 1 msec
;; SERVER: 172.1.0.54#53(172.1.0.54)
;; WHEN: Wed Feb 20 02:43:47 UTC 2019
;; MSG SIZE  rcvd: 358

One thing to note is the fact that the answer is not authoritative from the dig response codes of qr rd. If I switch back the /etc/resolv.conf to point at the authoritative dns server instead of the coredns server acting as a cache, it all works fine again.

Does the requests library have any issues resolving from non-authoritative sources or is there a way to configure the library to accept responses from non-authoritative dns sources ?

EDIT 20th Feb

The server the application is running on is configured correctly to speak to the dns server specified above:

root@server-test-7bff545c5b-42ln5:/app# cat /etc/resolv.conf
nameserver 172.1.0.54
search nstest.svc.cluster.local svc.cluster.local cluster.local 
ec2.internal
options ndots:5

EDIT 20th Feb 8:50 AM PST

I have been able to reproduce this with just python shell inside the machine if I run it back to back immediately:

>>> import socket
>>> socket.getaddrinfo('lb.consul.local', 80, 0, socket.SOCK_STREAM)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.10.9.0', 80))]
>>> socket.getaddrinfo('lb.consul.local', 80, 0, socket.SOCK_STREAM)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The logs on the dns side:

2019-02-20T16:35:21.688Z [INFO] 172.10.112.60:41539 - 6366 "AAAA IN lb.consul.local. udp 57 false 512" NOERROR qr,aa,rd 134 0.003542729s
2019-02-20T16:35:21.717Z [INFO] 172.10.112.60:58468 - 40098 "AAAA IN lb.consul.local. udp 57 false 512" NOERROR qr,rd 134 0.000064083s

Again, the failed response is missing aa.

EDIT 20th Feb 6:05 PM PST

A few more hours into this and I just worked around the problem by just disabling negative cache in coredns through this PR: https://github.com/coredns/coredns/pull/2588.

This seems to have fixed the problem. But then again, I still have no clue what caused those negative ipv6 query results coming from coredns cache to cause an exception in the sockets library when clearly the ipv4 one was resolving.

0

There are 0 best solutions below