getaddrinfo acting differently on different OS

212 Views Asked by At

I'm testing out a error path that requires me to drop a request from getaddrinfo. I set up 2 VMs:

  • RHEL 7.9
  • Ubuntu 20

The code is the same on both machines, just a call to getaddrinfo for test.com. I blocked all incoming packets to simulate a request of getaddrinfo getting dropped, however in the exact same scenario, the 2 OSes perform differently.

  • RHEL times out after 12 seconds with an error EAI_NONAME (No such file or directory)
  • Ubunutu times out after 20 seconds with an error EAI_AGAIN (Resource temporarily unavailable)

So my 2 questions are:

  • Why do these give 2 different errors?
  • Why are the timeouts different and where are they defined? I tried to look at the linux source but couldn't figure this out

Code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>

int main (void)
{
  struct addrinfo hints, *res, *result;
  int errcode;
  char addrstr[100];
  void *ptr;

  memset (&hints, 0, sizeof (hints));
  hints.ai_family = PF_UNSPEC;
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_flags |= AI_CANONNAME;

  errcode = getaddrinfo ("test.com", NULL, &hints, &result);
  if (errcode != 0)
  {
      perror ("getaddrinfo");
      return -1;
  }
  
  res = result;

  while (res)
    {
      inet_ntop (res->ai_family, res->ai_addr->sa_data, addrstr, 100);

      switch (res->ai_family)
        {
        case AF_INET:
          ptr = &((struct sockaddr_in *) res->ai_addr)->sin_addr;
          break;
        case AF_INET6:
          ptr = &((struct sockaddr_in6 *) res->ai_addr)->sin6_addr;
          break;
        }
      inet_ntop (res->ai_family, ptr, addrstr, 100);
      printf ("IPv%d address: %s (%s)\n", res->ai_family == PF_INET6 ? 6 : 4,
              addrstr, res->ai_canonname);
      res = res->ai_next;
    }
  
  freeaddrinfo(result);
  return 0;
}

Compiled with:

gcc test.c

RHEL resolv.conf:

search ht.home
nameserver 192.168.0.1
nameserver [IPV6 address 1]
nameserver [IPV6 address 2]

Ubuntu:

nameserver 127.0.0.53
options edns0 trust-ad
search ht.home
1

There are 1 best solutions below

0
R.. GitHub STOP HELPING ICE On

The Ubuntu behavior here is correct and the RHEL one is wrong - the result is inconclusive since it was both unable to get an address for the name and unable to get a response testifying to the nonexistence of the name.

The mechanism is probably a mix of glibc bugs (rather, intentional inconsistent behavior) and the difference between the RHEL configuration with a remote nameserver you've blocked, and the Ubuntu configuration proxied through systemd-resolved (which maybe you haven't blocked, instead only blocking it from making outgoing queries to the real network?). You could confirm the differences here by running your test program under strace and watching tcpdump both on the loopback and real network interfaces.

Basically, under some conditions, glibc treats errors the same as nonexistence of the name, while under others, it treats them as a reportable failure. If you're able to query the local systemd-resolved, it will return a ServFail error code because it can't get a result or cryptographic proof of nonexistence from the upstream nameservers, and glibc probably reports this, but doesn't report its own failure to contact the nameserver.