Very strange kevent problem possibly to do with vinum

Igor Sysoev is at rambler-co.ru
Fri Dec 10 03:01:19 PST 2004


On Wed, 8 Dec 2004, Kevin Day wrote:

> I have a really really strange kevent problem(i think anyway) that has
> really stumped me.
>
> Here's the scenario:
>
> Three mostly identical servers running 5.2.1 or 5.3 (problem exists on
> both). All three running thttpd sending out large files to thousands of
> clients. Thttpd internally uses kqueue/kevent and sendfile to send
> files rather quickly.
>
> All three have the same configuration, are getting approximately the
> same numbers of requests, and are sending approximately the same files.
> (I can swap IP addresses between the servers to confirm that the
> request distribution stays the same between the servers)
>
> Server #3 is able to send 400mbps or more of traffic through without
> breaking a sweat. Thttpd is either in "RUN", "biord" "sfbufa" or
> "*Giant" when I watch it in top, and I still have 80-90% idle time.
>
> Servers #1 and #2 seem to top out around 80mbps, and are constantly in
> "RUN" or "CPUx" states. I don't get any errors anywhere, but they just
> aren't capable of going any faster.
>
> Looking at ktrace on thttpd on all three servers, I see that server 3
> calls kevent, and gets 20-100 sockets in response back, that each get
> serviced. Servers 1 and 2 never seem to get more than 1 socket back
> from kevent. Even if the event is just that the socket was
> disconnected, nothing needs to be done on it, and kevent can be called
> again immediately, there's only 1 socket returned next time. I ran
> ktrace on thttpd for more than 15 minutes and produced a humongous
> ktrace file, and there were only a handful of times that kevent
> returned more than one socket with something to do on it. Contrasting
> that to server 3, where i never saw kevent returning less than a half
> dozen sockets at a time when it had a few hundred mbps flowing through
> it.
>
> The ONLY difference between servers 1 and 2 and server 3 is the disk
> subsystem.  Servers 1/2 use an "ahc" SCSI controller and vinum RAID5.
> Server 3 uses an "aac" hardware RAID. However, disk activity is really
> truly minimal on all of these servers. Most of the data remains cached,
> since 99% of the requests are for the same handful of files.
> systat/vmstat shows that the disks are busy less than 10% of the time,
> and artificially creating a bunch of disk load on any of the servers
> doesn't seem to affect anything.
>
> I'm not sure if the kevent difference is the cause of the problem
> (thttpd doesn't seem to handle going through its event loop over and
> over again for just one socket at a time, it makes some rather
> expensive syscalls from that loop), or if it's just a symptom. Is
> something in vinum possibly waking my process up somewhat prematurely?
> Is that even possible if the files are being sent through sendfile?

What does "systat -vm" show on these machines ?


Igor Sysoev
http://sysoev.ru/en/


More information about the freebsd-net mailing list