Strange interrupts problem.
F. Senault
fred.letter at lacave.net
Tue Nov 16 02:51:17 PST 2004
Hello.
I'm running into a recurring problem. I tried to search the list for
some info, but couldn't quite find anything related (there are some
discussions on interrupt storms lately, but none seem to apply).
I'm running FreeBSD-5.x on some old low end boxes, mostly for small
tasks like small websites, email servers, and so on.
Some time ago, on some of the boxes (with similar hardware - AMD Athlon
1.0GHz and 1.4GHz, MSI mainboards with VIA chipsets), I noticed a
unusually high interrupt rate - top says around 10% CPU time at all
times, even when the box is completely idle. The guilty process,
according to top -S, is :
27 root -28 -147 0K 12K RUN 17.9H 8.06% 8.06% swi5: clock sio
Since those are production boxes, with custom kernels and all, I left
them alone.
Now, I have to mount another machine with old and used hardware, and I
fall into the same problems, juste much worse. I tried two motherboards
with completely different hardware (Celeron 600 with intel chip versus
VIA C3 Samuel 2 with, well, VIA chip), and I have the same symptoms,
just much worse :
27 root -28 -147 0K 12K WAIT 5:12 23.93% 23.93% swi5: clock sio
uname -a shows :
FreeBSD cragganmore 5.3-STABLE FreeBSD 5.3-STABLE #0: Mon Nov 15
20:33:56 CET 2004 root at cragganmore:/usr/obj/usr/src/sys/GENERIC i386
(The box was upgraded from 5.3-BETAx. I made a GENERIC kernel to see if
my custom config was not at fault, but no such luck. All was recompiled
with no special tunables - the only line of interest in make.conf is
'CPUTYPE?=i586'.)
After a few quick tests, it seems that the machine boots cleanly (no
such load), but it begins to break under any kind of load : to stress
it, I tried a make -j8 buildworld, and it took just a few minutes.
Of course, once it begins, even if I leave the machine alone, the load
stays the same. Some samples :
1) During the build :
last pid: 12394; load averages: 7.65, 5.21, 2.54 up 0+00:07:42 10:28:45
105 processes: 10 running, 71 sleeping, 24 waiting
CPU states: 49.2% user, 0.0% nice, 25.4% system, 25.4% interrupt, 0.0% idle
Mem: 16M Active, 36M Inact, 35M Wired, 12K Cache, 59M Buf, 398M Free
Swap: 1024M Total, 1024M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
27 root -28 -147 0K 12K WAIT 0:32 24.02% 24.02% swi5: clock sio
9 root 171 52 0K 12K RUN 0:09 0.68% 0.68% pagezero
2) Just after I hit ctrl-C :
last pid: 12668; load averages: 3.64, 4.56, 2.46 up 0+00:08:37 10:29:40
73 processes: 2 running, 47 sleeping, 24 waiting
CPU states: 0.0% user, 0.0% nice, 0.4% system, 24.5% interrupt, 75.1% idle
Mem: 9684K Active, 36M Inact, 35M Wired, 12K Cache, 59M Buf, 405M Free
Swap: 1024M Total, 1024M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
11 root 132 0 0K 12K RUN 1:37 65.28% 65.28% idle
27 root -28 -147 0K 12K WAIT 0:45 22.71% 22.71% swi5: clock sio
3) Half an hour later :
last pid: 12737; load averages: 0.00, 0.02, 0.40 up 0+00:33:38 10:54:41
73 processes: 2 running, 47 sleeping, 24 waiting
CPU states: 0.8% user, 0.0% nice, 0.0% system, 25.6% interrupt, 73.6% idle
Mem: 9768K Active, 37M Inact, 35M Wired, 12K Cache, 59M Buf, 403M Free
Swap: 1024M Total, 1024M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
11 root 107 0 0K 12K RUN 20:29 75.54% 75.54% idle
27 root -28 -147 0K 12K WAIT 6:47 23.05% 23.05% swi5: clock sio
Strangely, it seems too that the load average falls much slower than
expected (3.5 to 0.0 in more than one minute for the first number).
On the other hand vmstat -i doesn't show anything anormal :
interrupt total rate
irq0: clk 349094 99
irq1: atkbd0 2 0
irq8: rtc 446819 127
irq11: rl0 uhci0+ 10318 2
irq13: npx0 2 0
irq14: ata0 9015 2
irq15: ata1 48 0
Total 815298 233
Of course, strangely enough, none of these boxes have any kind of device
behind com ports (which are driven by sio, right ?).
(BTW, on any kernel, I never had any "interrupt storm" messages - maybe
10-25% CPU is too low for that ? :) )
Well, this is it. I don't know what I can do to provide more
information, but it's a test box, I can break it at will. You can find
a dmesg output from a verbose boot at :
http://www.lacave.net/~fred/dmesg.boot
Fred
--
Sysadmins can't be sued for malpractice, but surgeons don't have to
deal with patients who install new versions of their own innards.
More information about the freebsd-stable
mailing list