9.3-RELEASE panic: spin lock held too long

Wed Aug 10 16:11:45 UTC 2016

On Wed, Aug 10, 2016 at 07:50:10PM +0430, Hooman Fazaeli wrote:
> On 2016-08-10 18:49, Konstantin Belousov wrote:
> > On Wed, Aug 10, 2016 at 06:35:15PM +0430, Hooman Fazaeli wrote:
> >> Hi
> >>
> >> on a 9.3-REL i386 box we have occasional "spin lock held too long" panics.
> >>
> >> System info:
> >> -------------
> >> - Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz CPU (4 cores, no hyper theading)
> >> - 4G non-ECC RAM
> >> - asterisk-1.8.30.0 from ports
> >> - dahdi-kmod26-2.6.1.r10738 from ports
> >> - powerd disabled.
> >> - Workload: ISDN & SIP call processing.
> >> ------------
> >>
> >> The panics are either on 'sched lock' or 'turnstile lock' spin locks.
> >>
> >> PANIC 1
> >> =======
> >> As below trace shows:
> >>
> >> 1- input arrives on a UDP socket
> >> 2- doselwakeup is called.
> >> 3- That wakeup call ends up in sched_add.
> >> 4- sched_add grabs 'sched lock 0' spin lock, and aparenlty, holds it for a too long time.
> >> 5- The pancing thread does the same calls as owner thread but panics because
> >>      it can't grab the the same spin lock.
> >>
> >>   > kgdb /boot/kernel/kernel /var/crash/vmcore.14
> >> ...
> >> kernel trap 12 with interrupts disabled
> >> spin lock 0xc140a4c0 (sched lock 0) held by 0xc807a2f0 (tid 100045) too long

> (kgdb) up 4
> #4  0xc0ac9e75 in _mtx_lock_spin (m=0xc140a4c0, tid=3384060112, opts=0, file=0x0, line=0) at ../../../kern/kern_mutex.c:557
> 557     ../../../kern/kern_mutex.c: No such file or directory.
>          in ../../../kern/kern_mutex.c
> 
> (kgdb) p *m
> $1 = {lock_object = {lo_name = 0xc140ab08 "sched lock 0", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 3355943664}
> 
> ------------
> 
> As you see, the mtx_lock is 3355943664 (0xc807a2f0), the same TID reported in panic string.
> 
> (kgdb) info threads
> ...
> 34 Thread 100045 (PID=12: intr/irq267: igb0:que 0) sched_switch (td=0xc807a2f0, newtd=0xc7da18d0, flags=265) at ../../../kern/sched_ule.c:1904
> ...
> 
I see.  What else could be, is the spinlock leak.
Can you _try_ to enable the WITNESS, without WITNESS_SKIPSPIN option.
Then show alllocks from the ddb prompt after the panic could reveal
the place which originally locked it.