9.3-RELEASE panic: spin lock held too long

Wed Aug 10 16:58:31 UTC 2016

On Wed, Aug 10, 2016 at 11:20 AM, Hooman Fazaeli <hoomanfazaeli at gmail.com>
wrote:

> > kgdb /boot/kernel/kernel /var/crash/vmcore.14
> ...
> ...
> (kgdb) bt
> #0  doadump (textdump=1) at pcpu.h:250
> #1  0xc0ade835 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:
> 454
> #2  0xc0adeb32 in panic (fmt=<value optimized out>) at
> ../../../kern/kern_shutdown.c:642
> #3  0xc0ac9cff in _mtx_lock_spin_failed (m=0x0) at
> ../../../kern/kern_mutex.c:515
> #4  0xc0ac9e75 in _mtx_lock_spin (m=0xc140a4c0, tid=3384060112, opts=0,
> file=0x0, line=0) at ../../../kern/kern_mutex.c:557
> #5  0xc0b096c5 in sched_add (td=0xc9b00bc0, flags=0) at
> ../../../kern/sched_ule.c:1153
> #6  0xc0b09890 in sched_wakeup (td=0xc9b00bc0) at
> ../../../kern/sched_ule.c:1991
> #7  0xc0ae8968 in setrunnable (td=0xc9b00bc0) at
> ../../../kern/kern_synch.c:537
> #8  0xc0b2227e in sleepq_resume_thread (sq=0xc869fd40, td=0xc9b00bc0,
> pri=104) at ../../../kern/subr_sleepqueue.c:763
> #9  0xc0b22fd3 in sleepq_broadcast (wchan=0xc95741e4, flags=1, pri=104,
> queue=0) at ../../../kern/subr_sleepqueue.c:865
> #10 0xc0a8c4cd in cv_broadcastpri (cvp=0xc95741e4, pri=104) at
> ../../../kern/kern_condvar.c:448
> #11 0xc0b2a406 in doselwakeup (sip=0xc963faac, pri=104) at
> ../../../kern/sys_generic.c:1683
> #12 0xc0b2a4be in selwakeuppri (sip=0xc963faac, pri=104) at
> ../../../kern/sys_generic.c:1651
> #13 0xc0a9fa59 in knote_enqueue (kn=<value optimized out>) at
> ../../../kern/kern_event.c:1786
> #14 0xc0aa073f in kqueue_register (kq=0xc963fa80, kev=0xf0e07b20,
> td=0xc9b4a8d0, waitok=1) at ../../../kern/kern_event.c:1154
> #15 0xc0aa09f3 in kern_kevent (td=0xc9b4a8d0, fd=152, nchanges=2,
> nevents=0, k_ops=0xf0e07c20, timeout=0x0) at ../../../kern/kern_event.c:850
> #16 0xc0aa16ce in sys_kevent (td=0xc9b4a8d0, uap=0xf0e07ccc) at
> ../../../kern/kern_event.c:771
> #17 0xc0fcc8c3 in syscall (frame=0xf0e07d08) at subr_syscall.c:135
> #18 0xc0fb60f1 in Xint0x80_syscall () at ../../../i386/i386/exception.s
> :270
> #19 0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
>
> (kgdb) up 4
> #4  0xc0ac9e75 in _mtx_lock_spin (m=0xc140a4c0, tid=3384060112, opts=0,
> file=0x0, line=0) at ../../../kern/kern_mutex.c:557
> 557     ../../../kern/kern_mutex.c: No such file or directory.
>         in ../../../kern/kern_mutex.c
>
> (kgdb) p *m
> $1 = {lock_object = {lo_name = 0xc140ab08 "sched lock 0", lo_flags =
> 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 3355943664}
>
> ------------
>
> As you see, the mtx_lock is 3355943664 (0xc807a2f0), the same TID reported
> in panic string.
>
> (kgdb) info threads
> ...
> 34 Thread 100045 (PID=12: intr/irq267: igb0:que 0) sched_switch
> (td=0xc807a2f0, newtd=0xc7da18d0, flags=265) at
> ../../../kern/sched_ule.c:1904
> ...

This sounds somewhat familiar.  Is it always 'sched lock 0' that is
ultimately leaked?  Could you try applying this patch and seeing whether
the new KASSERT triggers?

https://people.freebsd.org/~rstone/patches/sched_balance_kassert.diff