Very strange kevent problem possibly to do with vinum
Kevin Day
toasty at dragondata.com
Wed Dec 8 11:33:30 PST 2004
I have a really really strange kevent problem(i think anyway) that has
really stumped me.
Here's the scenario:
Three mostly identical servers running 5.2.1 or 5.3 (problem exists on
both). All three running thttpd sending out large files to thousands of
clients. Thttpd internally uses kqueue/kevent and sendfile to send
files rather quickly.
All three have the same configuration, are getting approximately the
same numbers of requests, and are sending approximately the same files.
(I can swap IP addresses between the servers to confirm that the
request distribution stays the same between the servers)
Server #3 is able to send 400mbps or more of traffic through without
breaking a sweat. Thttpd is either in "RUN", "biord" "sfbufa" or
"*Giant" when I watch it in top, and I still have 80-90% idle time.
Servers #1 and #2 seem to top out around 80mbps, and are constantly in
"RUN" or "CPUx" states. I don't get any errors anywhere, but they just
aren't capable of going any faster.
Looking at ktrace on thttpd on all three servers, I see that server 3
calls kevent, and gets 20-100 sockets in response back, that each get
serviced. Servers 1 and 2 never seem to get more than 1 socket back
from kevent. Even if the event is just that the socket was
disconnected, nothing needs to be done on it, and kevent can be called
again immediately, there's only 1 socket returned next time. I ran
ktrace on thttpd for more than 15 minutes and produced a humongous
ktrace file, and there were only a handful of times that kevent
returned more than one socket with something to do on it. Contrasting
that to server 3, where i never saw kevent returning less than a half
dozen sockets at a time when it had a few hundred mbps flowing through
it.
The ONLY difference between servers 1 and 2 and server 3 is the disk
subsystem. Servers 1/2 use an "ahc" SCSI controller and vinum RAID5.
Server 3 uses an "aac" hardware RAID. However, disk activity is really
truly minimal on all of these servers. Most of the data remains cached,
since 99% of the requests are for the same handful of files.
systat/vmstat shows that the disks are busy less than 10% of the time,
and artificially creating a bunch of disk load on any of the servers
doesn't seem to affect anything.
I'm not sure if the kevent difference is the cause of the problem
(thttpd doesn't seem to handle going through its event loop over and
over again for just one socket at a time, it makes some rather
expensive syscalls from that loop), or if it's just a symptom. Is
something in vinum possibly waking my process up somewhat prematurely?
Is that even possible if the files are being sent through sendfile?
Sorry for the vagueness, but I really don't know where else to look.
-- Kevin
More information about the freebsd-net
mailing list