ixgbe(4) spin lock held too long
John Baldwin
jhb at freebsd.org
Fri Oct 10 15:58:05 UTC 2014
On Thursday, October 09, 2014 02:31:32 PM Jason Wolfe wrote:
> On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin <jhb at freebsd.org> wrote:
> > My only other thought is if a direct timeout routine ran for a long time.
> >
> > I just committed a change to current that can let you capture KTR traces
> > of
> > callout routines for use with schedgraph (r272757). Unfortunately,
> > enabling KTR_SCHED can be slow. If you are up for trying it, I do think
> > that
> > building a kernel with KTR and KTR_SCHED enabled (KTR_COMPILE=KTR_SCHED
> > and
> > KTR_MASK=KTR_SCHED) with the kernel part of the commit I referenced above
> > applied is the best bet to figure out why it is spinning so long. If you
> > can
> > try that (and if it doesn't slow things down too much) and get a panic
> > with
> > those options enabled, then capture the output of
> > 'ktrdump -e /path/to/kernel -m /var/crash/vmcore.X -ct', we can use
> > Src/tools/sched/schedgraph.py to look at that output to get a picture of
> > what
> > was going on just prior to the crash.
> >
> > --
> > John Baldwin
>
> As luck would have it.. caught one of the boxes with the new KTR
> code/options after only an hour. Crashed in the same way w tid 100003 and
> looking the same in callout_process
>
> spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid
> 100003) too long
> spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid
> 100003) too long
>
> #4 0xffffffff8070d6fa in callout_process (now=7915682202423) at
> /usr/src/sys/kern/kern_
> timeout.c:490
>
> The ktrdump oddly only seems to have the last 702, though debug.ktr.entries
> is set to 1024. It appears the buffer may also start after 100003 had
> already hung? I've bumped debug.ktr.entries up in case we don't have
> enough history here.
>
> http://nitrology.com/ktrdump-spinlock.txt
Hmm, schedgraph.py chokes on this, but it's a bit interesting. It seems that
in this time sample, CPUs 1, 2, 4, and 5 were constantly running ixgbe
interrupt handlers. No actual thread state changes are logged (which is why
schedgraph choked).
Try setting the sysctl 'net.inet.tcp.per_cpu_timers' to 1 and see if that
makes a difference. I'm guessing they are all contending on the default
callout lock which is causing your headache.
--
John Baldwin
More information about the freebsd-net
mailing list