I'm testing out a error path that requires me to drop a request from getaddrinfo. I set up 2 VMs:
- RHEL 7.9
- Ubuntu 20
The code is the same on both machines, just a call to getaddrinfo for test.com. I blocked all incoming packets to simulate a request of getaddrinfo getting dropped, however in the exact same scenario, the 2 OSes perform differently.
- RHEL times out after 12 seconds with an error EAI_NONAME (No such file or directory)
- Ubunutu times out after 20 seconds with an error EAI_AGAIN (Resource temporarily unavailable)
So my 2 questions are:
- Why do these give 2 different errors?
- Why are the timeouts different and where are they defined? I tried to look at the linux source but couldn't figure this out
Code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
int main (void)
{
struct addrinfo hints, *res, *result;
int errcode;
char addrstr[100];
void *ptr;
memset (&hints, 0, sizeof (hints));
hints.ai_family = PF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags |= AI_CANONNAME;
errcode = getaddrinfo ("test.com", NULL, &hints, &result);
if (errcode != 0)
{
perror ("getaddrinfo");
return -1;
}
res = result;
while (res)
{
inet_ntop (res->ai_family, res->ai_addr->sa_data, addrstr, 100);
switch (res->ai_family)
{
case AF_INET:
ptr = &((struct sockaddr_in *) res->ai_addr)->sin_addr;
break;
case AF_INET6:
ptr = &((struct sockaddr_in6 *) res->ai_addr)->sin6_addr;
break;
}
inet_ntop (res->ai_family, ptr, addrstr, 100);
printf ("IPv%d address: %s (%s)\n", res->ai_family == PF_INET6 ? 6 : 4,
addrstr, res->ai_canonname);
res = res->ai_next;
}
freeaddrinfo(result);
return 0;
}
Compiled with:
gcc test.c
RHEL resolv.conf:
search ht.home
nameserver 192.168.0.1
nameserver [IPV6 address 1]
nameserver [IPV6 address 2]
Ubuntu:
nameserver 127.0.0.53
options edns0 trust-ad
search ht.home
The Ubuntu behavior here is correct and the RHEL one is wrong - the result is inconclusive since it was both unable to get an address for the name and unable to get a response testifying to the nonexistence of the name.
The mechanism is probably a mix of glibc bugs (rather, intentional inconsistent behavior) and the difference between the RHEL configuration with a remote nameserver you've blocked, and the Ubuntu configuration proxied through
systemd-resolved(which maybe you haven't blocked, instead only blocking it from making outgoing queries to the real network?). You could confirm the differences here by running your test program understraceand watchingtcpdumpboth on the loopback and real network interfaces.Basically, under some conditions, glibc treats errors the same as nonexistence of the name, while under others, it treats them as a reportable failure. If you're able to query the local
systemd-resolved, it will return aServFailerror code because it can't get a result or cryptographic proof of nonexistence from the upstream nameservers, and glibc probably reports this, but doesn't report its own failure to contact the nameserver.