Bind DoS?
Attila Nagy
bra at fsn.hu
Sat Sep 3 16:12:25 PDT 2005
Hello,
I am currently trying to set up two caching nameservers and noticed an
interesting behaviour.
The configuration is the following:
two FreeBSD/amd64 6-CURRENT machines, with single Opteron processors.
Bind was compiled from ports, without threading, with gcc34 (from
ports), with -O2 -static. It runs in a jail, with nothing more than the
config and a nearly empty devfs mount.
Machine A has a simple config of the following:
options {
directory "/etc/bind";
tcp-clients 256;
recursive-clients 8192;
max-cache-size 600M;
minimal-responses yes;
pid-file "/tmp/named.pid";
forwarders { MACHINE_B_IP; };
};
Machine B has the same bind, but runs as an authoritative NS with a
joker record of:
* IN TXT "256xA"
in the . zone (so it answers 256 "A"s for everything).
The test:
from machine B I start a queryperf, this way:
queryperf -d list -s MACHINE_A_IP
where list has the following:
www.RANDOMNUMBER.hu TXT
[...] this is 9000000 times.
During the test, machine A starts to fill its cache up until about 860
MBs. Until that I see this in top:
CPU states: 27.7% user, 0.0% nice, 58.1% system, 14.2% interrupt, 0.0%
idle
On machine B queryperf receives answer within the default timeout (5
seconds).
After bind reaches about 860 MBs, it starts to eat CPU, so there is 100%
user and nearly 0% system and interrupt usage.
queryperf starts to time out with the following:
[Timeout] Query timed out: msg id 64837
Warning: Received a response with an unexpected (maybe timed out) id: 64837
The server effectively dies, it can answer only a very little number of
queries and with very low performance. If I stop queryperf, bind remains
in the CPU eating state:
76423 bind 1 129 0 861M 862M RUN 8:30 97.71% named
Because the machine has much more RAM, I first tried with 1200M in the
config. The server has reached its "zombie" state at around 1600 MB of
usage and it was much unresponsive.
On another (real) server, I noticed similar behaviour this week. Bind
started to eat all CPU resources, there were only "recursive quota
reached" messages in the logs, but rndc status said only very low usage
(for example 60/1024 on that server).
I can repeat this with and without patch-lib_dns_resolver.c.
If I stop the queries, the server starts to answer the queries in a few
minutes, after it has finished its strange "CPU eating" loop.
ktrace says, it's doing this many-many times between two successful queries:
76423 named CALL gettimeofday(0x7fffffffe450,0)
76423 named RET gettimeofday 0
Any ideas?
Thanks,
--
Attila Nagy e-mail: Attila.Nagy at fsn.hu
Free Software Network (FSN.HU) phone @work: +361 371 3536
ISOs: http://www.fsn.hu/?f=download cell.: +3630 306 6758
More information about the freebsd-hackers
mailing list