insanely-high interrupt rates
Andrew Gierth
andrew at tao11.riddles.org.uk
Wed Apr 3 08:17:41 UTC 2019
>>>>> "Ian" == Ian Lepore <ian at freebsd.org> writes:
Ian> I now have an rpi2b, and I see the same thing you do in the
Ian> 'vmstat -i' output, including rates like 500K int/sec on cpu3
Ian> timer0. But while it was behaving like that, the output from
Ian> 'vmstat 1' showed a perfectly steady 8800-9200 int/sec, which is
Ian> certainly more consistent with top showing under 2% being used for
Ian> combined system+interrupt.
I'm seeing this too right now. The problem is that whatever this problem
is, it's not being accounted for in the %cpu statistics, probably
because it takes one CPU completely out of play.
So here's what I see. First, top:
last pid: 6110; load averages: 1.00, 1.00, 1.00 up 1+19:26:47 08:35:02
107 threads: 6 running, 81 sleeping, 20 waiting
CPU: 0.3% user, 0.0% nice, 0.7% system, 0.2% interrupt, 98.8% idle
Mem: 10M Active, 611M Inact, 32M Laundry, 168M Wired, 98M Buf, 115M Free
Swap: 4096M Total, 4096M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND
10 root 155 ki31 0 32K CPU2 2 43.3H 99.35% idle{idle: cpu2}
10 root 155 ki31 0 32K RUN 1 43.3H 98.71% idle{idle: cpu1}
10 root 155 ki31 0 32K CPU0 0 43.1H 98.49% idle{idle: cpu0}
1186 andrew 20 0 14M 8728K select 1 0:01 0.90% xterm
Notice the load average of 1+, and that while it's reporting 98.8% idle,
only three of the four cpu idle processes show up. The fourth idle
process shows no accumulated CPU time, as shown here by ps axlH:
0 10 0 0 155 0 0 32 - RNL - 2589:43.57 [idle/idle: cpu0]
0 10 0 0 155 0 0 32 - RNL - 2597:14.64 [idle/idle: cpu1]
0 10 0 0 155 0 0 32 - RNL - 2597:19.11 [idle/idle: cpu2]
0 10 0 0 155 0 0 32 - RNL - 0:00.00 [idle/idle: cpu3]
The CPU time spent handling these interrupts is not being accounted for
anywhere.
Also, to confirm that there is something odd going on, the CPU is
running hotter than it would normally do while idle: ~42C rather than
the ~38C that I would normally see with this level of activity.
Ian> I think the problem here is with vmstat -i, not with actual
Ian> interrupts. Something about the stats reporting is wrong on
Ian> armv6/v7.
There are two obvious issues with vmstat -i: it double- or triple-counts
some interrupts, and it reports a false rate once the counters wrap
(which happens quite quickly when this issue is in play). But these
don't account for the problem described here.
Here's lines from vmstat -i taken 10 seconds apart:
local_intc0,3: + 3085878928 19685
local_intc0,3: + 3094431425 19738
That's 8552497 interrupts in 10 seconds, which matches systat -vm's
report of ~850k interrupts/sec. Since these are 32-bit counters they
will wrap in an hour and a half at that rate, so the ~19700/sec rate
reported is entirely spurious (since it's just counter/uptime).
Looking at the generic_timer driver, there's an obvious problem: it
installs one interrupt handler on all the irqs and supposedly enables
only one of the timers, but the interrupt handler code simply assumes
that the timer causing the interrupt is the one that it enabled; if it
gets an interrupt from one of the others, it'll never clear it or mask
it, it'll just return with the interrupt unhandled, hence the interrupt
storm. On 32-bit arm it is only supposed to enable the timer that's
attached to local_intc0,1 and this indeed shows an interrupt rate of
~1100/sec which seems about right for the clock interrupt; the timer on
local_intc0,3 (the virtual one) is never supposed to be enabled so far
as I can tell.
--
Andrew.
More information about the freebsd-arm
mailing list