Re: Patched gpsd and /dev/pps0 results in "sleeping thread" kernel panic

From: Craig Leres <leres_at_freebsd.org>
Date: Wed, 01 Sep 2021 06:04:07 UTC
On 8/31/21 9:35 PM, Warner Losh wrote:
> Either I'm missing something (likely am), or this might fix it up,
> or at least get away from the warning:
> 
> https://reviews.freebsd.org/D31763 <https://reviews.freebsd.org/D31763>
> 
> Note: I can't recall why ppbus has to be locked for this call.
> This code dates from the very earliest days of locking and
> so may do things simply because it seemed like a good idea
> without a specific notion as to what that lock is protecting. If
> so, the real fix may be to not take the lock in pps_ioctl at
> all and maybe instead use a reference count (the most
> often reason for 'a good idea' was to keep the device
> from going away, though this is a parent lock, not a
> child one so I'm less sure about that being the reason).

The crash looks the same or at least very similar to the unpatched kernel.

If you'd like to experiment with switching from the lock to a reference 
count I am able to test that too (as well as testing that it doesn't 
break with the ntpd's normal use of /dev/pps0).

(Do you prefer comments/traces/feedback in this thread or in the review?)

Thanks!

		Craig


toc2 1 # kgdb /boot/kernel/kernel /var/crash/vmcore.2
GNU gdb (GDB) 10.2 [GDB v10.2 for FreeBSD]
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.2".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
     <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /boot/kernel.LBLNET/kernel.debug...

Unread portion of the kernel message buffer:
Sleeping thread (tid 101007, pid 1805) owns a non-sleepable lock
KDB: stack backtrace of thread 101007:
sched_switch() at sched_switch+0x630/frame 0xfffffe0070e3b760
mi_switch() at mi_switch+0xd4/frame 0xfffffe0070e3b790
sleepq_catch_signals() at sleepq_catch_signals+0x403/frame 
0xfffffe0070e3b7e0
sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame 0xfffffe0070e3b820
_sleep() at _sleep+0x1b3/frame 0xfffffe0070e3b8a0
pps_ioctl() at pps_ioctl+0x298/frame 0xfffffe0070e3b8f0
ppsioctl() at ppsioctl+0x48/frame 0xfffffe0070e3b920
devfs_ioctl() at devfs_ioctl+0xb0/frame 0xfffffe0070e3b970
VOP_IOCTL_APV() at VOP_IOCTL_APV+0x7b/frame 0xfffffe0070e3b9a0
vn_ioctl() at vn_ioctl+0x16a/frame 0xfffffe0070e3bab0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe0070e3bad0
kern_ioctl() at kern_ioctl+0x2b7/frame 0xfffffe0070e3bb30
sys_ioctl() at sys_ioctl+0xfa/frame 0xfffffe0070e3bc00
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe0070e3bd30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0070e3bd30
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8004c899a, rsp = 
0x7fffdfdfc6a8, rbp = 0x7fffdfdfc730 ---
panic: sleeping thread
cpuid = 8
time = 1630475518
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfffffe005ab73ab0
vpanic() at vpanic+0x17b/frame 0xfffffe005ab73b00
panic() at panic+0x43/frame 0xfffffe005ab73b60
propagate_priority() at propagate_priority+0x282/frame 0xfffffe005ab73b90
turnstile_wait() at turnstile_wait+0x30c/frame 0xfffffe005ab73be0
__mtx_lock_sleep() at __mtx_lock_sleep+0x199/frame 0xfffffe005ab73c70
ppcintr() at ppcintr+0x2a0/frame 0xfffffe005ab73c90
ithread_loop() at ithread_loop+0x23c/frame 0xfffffe005ab73cf0
fork_exit() at fork_exit+0x7e/frame 0xfffffe005ab73d30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe005ab73d30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 2m39s
Dumping 593 out of 12240 
MB:..3%..11%..22%..33%..41%..52%..63%..71%..81%..92%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55      /usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory.
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=1) at ../../../kern/kern_shutdown.c:371
#2  0xffffffff80b83b2a in kern_reboot (howto=260)
     at ../../../kern/kern_shutdown.c:451
#3  0xffffffff80b83f83 in vpanic (fmt=<optimized out>, ap=<optimized out>)
     at ../../../kern/kern_shutdown.c:880
#4  0xffffffff80b83da3 in panic (fmt=<unavailable>)
     at ../../../kern/kern_shutdown.c:807
#5  0xffffffff80be71a2 in propagate_priority (td=0xfffff801c4418000)
     at ../../../kern/subr_turnstile.c:228
#6  0xffffffff80be7d6c in turnstile_wait (ts=0xfffff800039bae40,
     owner=<optimized out>, queue=0) at ../../../kern/subr_turnstile.c:785
#7  0xffffffff80b62cf9 in __mtx_lock_sleep (c=0xfffff80003932ad0,
     v=<optimized out>) at ../../../kern/kern_mutex.c:654
#8  0xffffffff8086fd10 in ppcintr (arg=0xfffff80003932a00)
     at ../../../dev/ppc/ppc.c:1546
#9  0xffffffff80b463cc in intr_event_execute_handlers (p=<optimized out>,
     ie=0xfffff800030d9d00) at ../../../kern/kern_intr.c:1143
#10 ithread_execute_handlers (p=<optimized out>, ie=0xfffff800030d9d00)
     at ../../../kern/kern_intr.c:1156
#11 ithread_loop (arg=0xfffff800039aea00) at ../../../kern/kern_intr.c:1236
#12 0xffffffff80b42e6e in fork_exit (
     callout=0xffffffff80b46190 <ithread_loop>, arg=0xfffff800039aea00,
     frame=0xfffffe005ab73d40) at ../../../kern/kern_fork.c:1080
#13 <signal handler called>
(kgdb)