ixgbe(4) spin lock held too long

Sat Jan 17 19:21:59 UTC 2015

On Mon, Dec 15, 2014 at 9:23 AM, John Baldwin <jhb at freebsd.org> wrote:
> On Wednesday, December 10, 2014 12:47:02 PM Jason Wolfe wrote:
>> John,
>>
>> So apparently the concurrent timer scheduling was not fixed, though it
>> does seem rarer.  We had about 2 weeks of stability, then last night
>> we had a crash on a machine with the 4 KTR options listed in the prior
>> email.  I'm not sure if this is relevant, but the only request for the
>> tcp lock that was 'blocked' in the ktrdump (38k entries) happened just
>> before the persist timer was armed:
>>
>>  37898   3 23540056055651738 KTRGRAPH group:"thread", id:"swi4: clock
>> tid 100012", state:"spinning", attributes: lockname:"tcp"
>>  34794   0 23540056027080726 KTRGRAPH group:"thread", id:"irq278:
>> ix0:que 0 tid 100059", state:"blocked", attributes: prio:8,
>> wmesg:"(null)", lockname:"tcp"
>
> I was hoping to see when the rexmit timer was scheduled, but it isn't
> mentioned at all in the ktrdump (granted, it only covers ~125 milliseconds).
> The line at 37897 is the only one that mentions a timer for this connection.
> The dump doesn't include when either timer was scheduled.  One thing that you
> can tell from the schedgraph (and from above) is that some other thread was
> accessing this connection's TCB at the same time as the persist timer spun for
> 2 microseconds on the lock before it was able to run, so something did frob
> the state just before this ran.  However, all the threads that were running at
> the time were either idle or in the process of switching out.
>
>>  32909   1 23540056008857406 KTRGRAPH group:"thread", id:"swi4: clock
>> tid 100010", state:"spinning", attributes: lockname:"tcp"
>>  29394   4 23540055974204380 KTRGRAPH group:"thread", id:"irq282:
>> ix0:que 4 tid 100067", state:"spinning", attributes: lockname:"tcp"
>>  29390   1 23540055974198218 KTRGRAPH group:"thread", id:"irq279:
>> ix0:que 1 tid 100061", state:"spinning", attributes: lockname:"tcp"
>>   7130   0 23540055791854144 KTRGRAPH group:"thread", id:"irq278:
>> ix0:que 0 tid 100059", state:"spinning", attributes: lockname:"tcp"
>>   5558   5 23540055779687890 KTRGRAPH group:"thread", id:"squid tid
>> 100207", state:"spinning", attributes: lockname:"tcp"
>>   5306   1 23540055776901006 KTRGRAPH group:"thread", id:"irq279:
>> ix0:que 1 tid 100061", state:"spinning", attributes: lockname:"tcp"
>>
>> panic: tcp_setpersist: retransmit pending
>>
>> (kgdb) bt
>> #0  doadump (textdump=1) at pcpu.h:219
>> #1  0xffffffff806fcb11 in kern_reboot (howto=260) at
>> /usr/src/sys/kern/kern_shutdown.c:452
>> #2  0xffffffff806fce74 in panic (fmt=<value optimized out>) at
>> /usr/src/sys/kern/kern_shutdown.c:759
>> #3  0xffffffff8084c853 in tcp_setpersist (tp=<value optimized out>) at
>> /usr/src/sys/netinet/tcp_output.c:1718
>> #4  0xffffffff80854856 in tcp_timer_persist (xtp=0xfffff8003e58e800)
>> at /usr/src/sys/netinet/tcp_timer.c:518
>> #5  0xffffffff8070faeb in softclock_call_cc (c=0xfffff8003e58eac0,
>> cc=0xffffffff8126ce00, direct=0)
>>     at /usr/src/sys/kern/kern_timeout.c:682
>> #6  0xffffffff8070feb4 in softclock (arg=<value optimized out>) at
>> /usr/src/sys/kern/kern_timeout.c:810
>>
>> Jason
>>
>> On Tue, Dec 2, 2014 at 1:18 PM, Jason Wolfe <nitroboost at gmail.com> wrote:
>> > So we've resolved some of our other issues, including the concurrent
>> > retrans/persist timer scheduling we saw with
>> > net.inet.tcp.per_cpu_timers on.  Seems it may have been fixed by
>> > disabling flowdirector in ixgbe as Adriann mentioned.  Running on
>> > 10.1-STABLE from 11/20 (kern.osreldate: 1001502), we are still seeing
>> > the spinlock held too long panic though.  I have a kernel compiled
>> > with these options, and we hit it:
>> >
>> > options         KTR
>> > options         KTR_COMPILE=(KTR_CALLOUT|KTR_SCHED)
>> > options         KTR_MASK=(KTR_CALLOUT|KTR_SCHED)
>> > options         KTR_ENTRIES=131072
>> >
>> > I had to grab the last 30000 to see actions other than the statclock
>> > entries on the ix queues, if you look at the last 2500 lines or so you
>> > can see the last callouts happening.
>
> I did look at the
> ktrdump via schedgraph and it is definitely true that several threads are all
> blocked on the callout lock:
>
>  62185   5 19987364517280198 KTRGRAPH group:"thread", id:"pagedaemon tid
> 100096", state:"spinning", attributes: lockname:"callout"
>  62184   2 19987364505968920 KTRGRAPH group:"thread", id:"irq280: ix0:que 2
> tid 100063", state:"spinning", attributes: lockname:"callout"
>  62178   4 19987364504400868 KTRGRAPH group:"thread", id:"irq282: ix0:que 4
> tid 100067", state:"spinning", attributes: lockname:"callout"
>  62172   1 19987364504141702 KTRGRAPH group:"thread", id:"irq279: ix0:que 1
> tid 100061", state:"spinning", attributes: lockname:"callout"
>  62166   3 19987364503806636 KTRGRAPH group:"thread", id:"irq281: ix0:que 3
> tid 100065", state:"spinning", attributes: lockname:"callout"
>
> In fact, the only CPU not spinning on this lock is CPU 0.  According to
> ktrdump, it was running the idle thread.  That said, the ktrdump also claims
> that these threads started running callouts (ones that timeout sleeps
> like msleep(..., hz)) (which is odd because it didn't log a "running" event
> when it got the lock):
>
>  65919   1 19987382901783634 callout 0xffffffff81253030 finished
>  65917   1 19987382901782106 KTRGRAPH group:"thread", id:"irq279: ix0:que 1
> tid 100061", point:"wokeup", attributes: linkedto:"swapper tid 100000"
>  65915   1 19987382901775874 callout 0xffffffff81253030 func
> 0xffffffff8073f9e0 arg 0xffffffff81252ca0
>  64142   1 19987373824962362 callout 0xfffff801e5394850 finished
>  64140   1 19987373824961286 KTRGRAPH group:"thread", id:"irq279: ix0:que 1
> tid 100061", point:"wokeup", attributes: linkedto:"sendmail tid 100167"
>  64138   1 19987373824960070 callout 0xfffff801e5394850 func
> 0xffffffff8073f9e0 arg 0xfffff801e53944c0
>  62721   4 19987366581196864 callout 0xfffff80022426850 finished
>  62719   4 19987366581189888 KTRGRAPH group:"thread", id:"irq282: ix0:que 4
> tid 100067", point:"wokeup", attributes: linkedto:"named tid 100158"
>  62717   4 19987366581189192 callout 0xfffff80022426850 func
> 0xffffffff8073f9e0 arg 0xfffff800224264c0
>  62680   2 19987366404396696 callout 0xfffff800220add10 finished
> ...
>
> So these aren't TCP timers, but td_slpcallout timers.  Those are already
> scheduled on multiple CPUs.  Maybe we can try moving those all to zero
> instead:
>
> Index: subr_sleepqueue.c
> ===================================================================
> --- subr_sleepqueue.c   (revision 275806)
> +++ subr_sleepqueue.c   (working copy)
> @@ -382,8 +382,8 @@ sleepq_set_timeout_sbt(void *wchan, sbintime_t sbt
>         MPASS(TD_ON_SLEEPQ(td));
>         MPASS(td->td_sleepqueue == NULL);
>         MPASS(wchan != NULL);
> -       callout_reset_sbt_on(&td->td_slpcallout, sbt, pr,
> -           sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC);
> +       callout_reset_sbt(&td->td_slpcallout, sbt, pr,
> +           sleepq_timeout, td, flags | C_DIRECT_EXEC);
>  }
>
>  /*
>
>
> --
> John Baldwin

Just to round this thread out for anyone that comes around hunting for
similar problems.. it was found to be an issues in the td_slpcallout
timers.  HPS later came to the same conclusions based on some issues
he was having, and produced a patch that went into current a few days
ago:

https://svnweb.freebsd.org/base?view=revision&revision=277213

Thanks to both for the time/work.

Jason