intermittent network failures with drill and icinga2
David Newman
dnewman at networktest.com
Sat Aug 17 04:06:14 UTC 2019
12.0-RELEASE-p9, icinga2 2.10.5_1, drill 1.7.0
Do drill and ping use different system calls to resolve hostnames to IP
addresses?
Asking because around 5x-10x per day, icinga2 returns an error because
this system can't resolve a hostname to an IP address.
However, the system is reachable by ssh during these error periods, and
it _can_ resolve hostnames when using ping.
Here's an example where drill doesn't work and ping does:
[dnewman at hood ~]$ drill mail.networktest.com @puck.nether.net
Error: error sending query: Could not send or receive, because of
network error
[dnewman at hood ~]$ ping puck.nether.net
PING puck.nether.net (204.42.254.5): 56 data bytes
64 bytes from 204.42.254.5: icmp_seq=0 ttl=51 time=76.332 ms
[dnewman at hood ~]$ drill mail.networktest.com @puck.nether.net
Error: error sending query: Could not send or receive, because of
network error
The /etc/resolv.conf file points to two internal nameservers, both
reachable:
[dnewman at hood ~]$ cat /etc/resolv.conf
search inf.networktest.com networktest.com
nameserver 172.31.53.12
nameserver 172.31.53.13
Also, icinga2 resolves hundreds of hostnames but almost exclusively this
problem occurs when doing a check on puck.nether.net. I don't think
there's anything wrong with puck.nether.net DNS or reachability; even
this system can ping it, and I can resolve it from any other host.
Other host checks and networking on this system otherwise work fine.
Thanks in advance for clues on what might cause these intermittent
failures in drill and icinga2, and what to do to fix them.
dn
ps. This system is a VMware VM. I don't believe it's a VMware issue,
however; aside from periodic inability to reach one host its networking
works OK, and all other server VMs on the same VMware host with similar
network configurations don't have this issue.
More information about the freebsd-questions
mailing list