From nobody Wed Sep 01 04:35:57 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 3940B17A4D13 for ; Wed, 1 Sep 2021 04:36:15 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x92f.google.com (mail-ua1-x92f.google.com [IPv6:2607:f8b0:4864:20::92f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Gzrq23YjSz3s8D for ; Wed, 1 Sep 2021 04:36:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x92f.google.com with SMTP id g2so672739uad.4 for ; Tue, 31 Aug 2021 21:36:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mvH2Y3MYC5yqvoJThq5BhJuUbnlaVdGerZkym7AMtLQ=; b=iH+/slihlbTP+fQuCIh8SL8tmtqCrh6pvF/XBikcJcgEDeiwDywaf3qQUdIjEKSUK4 MXN3ytQIAewbAneYH89Iv0l5LFpcTXfdL8IPhTnuIQNr7FT9kpm3d6LDWE5TX+Rk8Jco eHiHvUumuNAYnihreVXD1MSzuR3BySL40U/FFSk24VUyATxqdXgUM8JX4jpVCf4p1T7i MDz2CztZRthyIBDFAl1bRFtqDHslzS5+D5Q27Svvmt3yGSnCcbNix/z4F5aeF/4vpRxH I+r6tD4njFB3AqEuPyvIMOWRY84Jz9czYPtqto7WJdt9cOofDPBNRhahlkCaLRJaIWOk 7WRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mvH2Y3MYC5yqvoJThq5BhJuUbnlaVdGerZkym7AMtLQ=; b=uHE/j5pup5VExjVo7XQK0Ahql4tjsU2lUu9lrb3/dnkrv0yOWzFavvU5WWUqpZKnFc 1CX8MMrhxU78rFhfCejOkvM34ebpm3j2etiWSFqtTz6m6k+FAv/2x2UdbSXu+6MiMAde 3VrXckoKT+w9FrBfFT8/fG8ePYux+t+IiDhiAu3v3vcOF0NBEpOFjGs8UEIyNYUdzXwr rpZvBbQb4VmywMMedOaUBs4/J0ynyigkV7td7/cx/YM5d4EtFcra9k8XEQ/14Ij1j5Eu X5Gj5Y2a+ZSfnr7bxaEds+RP58FWps6/TnMca2xNl8C7AfxvY/6kRUjptfahkWuiQZKr X7uQ== X-Gm-Message-State: AOAM530kacHry4DTA2to+gVCJgBJ1+7yp1ekIn7flGJmOTPzClc4+FJA rtGKIbc+RK5KaZqUYo6GNueLupiIChf1n2rEOc2iSan1lSJgeA== X-Google-Smtp-Source: ABdhPJyw12IPWyu/2xy0jxHLySlkXxZXWifDmTv4EWChOIfNcWo0a9WQBij57DbUbvE+6Y3GSc/zRzS5cBeAQvp8Tdc= X-Received: by 2002:ab0:74cc:: with SMTP id f12mr21695480uaq.85.1630470968027; Tue, 31 Aug 2021 21:36:08 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: <5476ea21-9e8a-32f5-08ff-add46c02d910@freebsd.org> In-Reply-To: From: Warner Losh Date: Tue, 31 Aug 2021 22:35:57 -0600 Message-ID: Subject: Re: Patched gpsd and /dev/pps0 results in "sleeping thread" kernel panic To: Ian Lepore Cc: Craig Leres , FreeBSD Hackers Content-Type: multipart/alternative; boundary="000000000000ddba6305cae79aef" X-Rspamd-Queue-Id: 4Gzrq23YjSz3s8D X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b="iH+/slih"; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::92f) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-1.00 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_SPAM_SHORT(1.00)[1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::92f:from]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com] X-ThisMailContainsUnwantedMimeParts: Y --000000000000ddba6305cae79aef Content-Type: text/plain; charset="UTF-8" On Tue, Aug 31, 2021 at 9:25 PM Warner Losh wrote: > > > On Tue, Aug 31, 2021 at 9:13 PM Ian Lepore wrote: > >> On Tue, 2021-08-31 at 21:07 -0600, Warner Losh wrote: >> > On Tue, Aug 31, 2021 at 8:11 PM Craig Leres >> > wrote: >> > >> > > >> > > On 8/31/21 6:54 PM, Warner Losh wrote: >> > > > Can you post a traceback? We recently tightened up the sleeping >> > > conditions >> > > > and many of the panics are straight forward to fix >> > > >> > > Is this sufficient? >> > > >> > >> > Maybe. I'm guessing that the pps driver isn't setting driver_mtx for >> > the >> > pps_state that it keeps, so when we're sleeping in pps_ioctl we're >> > not >> > dropping the the ppbus lock, which leads to this diagnostic. >> > >> > Warner >> > >> > >> >> >> Exactly so, I've just been digging through the code. It's not an easy >> fix because the ppbus pps driver doesn't have access to the ppbus mutex >> at the point where it's registering to be a pps driver. >> > > I just came to the conclusion that either we'd need to create a call in > ppbus > to set this mutex, or to return its lock... > > >> Craig, is there any chance you can configure your hardware so that the >> PPS signal is connected to either the CTS or DCD pin of a uart (even >> the same uart that's communicating with the gps receiver should work if >> you use DCD)? You'll get jitter and accuracy performance comparable to >> the parallel port pin, and avoid this mutex/sleeping problem. >> > > That's easier, but depending on the hardware might be harder :( > Either I'm missing something (likely am), or this might fix it up, or at least get away from the warning: https://reviews.freebsd.org/D31763 Note: I can't recall why ppbus has to be locked for this call. This code dates from the very earliest days of locking and so may do things simply because it seemed like a good idea without a specific notion as to what that lock is protecting. If so, the real fix may be to not take the lock in pps_ioctl at all and maybe instead use a reference count (the most often reason for 'a good idea' was to keep the device from going away, though this is a parent lock, not a child one so I'm less sure about that being the reason). Warner > Warner > > >> -- Ian >> >> >> >> > > Craig >> > > >> > > toc2 4 # kgdb /boot/kernel/kernel /var/crash/vmcore.1 >> > > GNU gdb (GDB) 10.2 [GDB v10.2 for FreeBSD] >> > > Copyright (C) 2021 Free Software Foundation, Inc. >> > > License GPLv3+: GNU GPL version 3 or later >> > > >> > > This is free software: you are free to change and redistribute it. >> > > There is NO WARRANTY, to the extent permitted by law. >> > > Type "show copying" and "show warranty" for details. >> > > This GDB was configured as "x86_64-portbld-freebsd12.2". >> > > Type "show configuration" for configuration details. >> > > For bug reporting instructions, please see: >> > > . >> > > Find the GDB manual and other documentation resources online at: >> > > . >> > > >> > > For help, type "help". >> > > Type "apropos word" to search for commands related to "word"... >> > > Reading symbols from /boot/kernel/kernel... >> > > Reading symbols from /boot/kernel.LBLNET/kernel.debug... >> > > >> > > Unread portion of the kernel message buffer: >> > > Sleeping thread (tid 101408, pid 2046) owns a non-sleepable lock >> > > KDB: stack backtrace of thread 101408: >> > > sched_switch() at sched_switch+0x630/frame 0xfffffe0070e4f760 >> > > mi_switch() at mi_switch+0xd4/frame 0xfffffe0070e4f790 >> > > sleepq_catch_signals() at sleepq_catch_signals+0x403/frame >> > > 0xfffffe0070e4f7e0 >> > > sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame >> > > 0xfffffe0070e4f820 >> > > _sleep() at _sleep+0x1b3/frame 0xfffffe0070e4f8a0 >> > > pps_ioctl() at pps_ioctl+0x298/frame 0xfffffe0070e4f8f0 >> > > ppsioctl() at ppsioctl+0x48/frame 0xfffffe0070e4f920 >> > > devfs_ioctl() at devfs_ioctl+0xb0/frame 0xfffffe0070e4f970 >> > > VOP_IOCTL_APV() at VOP_IOCTL_APV+0x7b/frame 0xfffffe0070e4f9a0 >> > > vn_ioctl() at vn_ioctl+0x16a/frame 0xfffffe0070e4fab0 >> > > devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe0070e4fad0 >> > > kern_ioctl() at kern_ioctl+0x2b7/frame 0xfffffe0070e4fb30 >> > > sys_ioctl() at sys_ioctl+0xfa/frame 0xfffffe0070e4fc00 >> > > amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe0070e4fd30 >> > > fast_syscall_common() at fast_syscall_common+0xf8/frame >> > > 0xfffffe0070e4fd30 >> > > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8004c899a, rsp >> > > = >> > > 0x7fffdfdfc6a8, rbp = 0x7fffdfdfc730 --- >> > > panic: sleeping thread >> > > cpuid = 8 >> > > time = 1630456142 >> > > KDB: stack backtrace: >> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> > > 0xfffffe005ab73ab0 >> > > vpanic() at vpanic+0x17b/frame 0xfffffe005ab73b00 >> > > panic() at panic+0x43/frame 0xfffffe005ab73b60 >> > > propagate_priority() at propagate_priority+0x282/frame >> > > 0xfffffe005ab73b90 >> > > turnstile_wait() at turnstile_wait+0x30c/frame 0xfffffe005ab73be0 >> > > __mtx_lock_sleep() at __mtx_lock_sleep+0x199/frame >> > > 0xfffffe005ab73c70 >> > > ppcintr() at ppcintr+0x2a0/frame 0xfffffe005ab73c90 >> > > ithread_loop() at ithread_loop+0x23c/frame 0xfffffe005ab73cf0 >> > > fork_exit() at fork_exit+0x7e/frame 0xfffffe005ab73d30 >> > > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe005ab73d30 >> > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >> > > Uptime: 17m20s >> > > Dumping 607 out of 12240 >> > > MB:..3%..11%..22%..32%..43%..51%..61%..72%..82%..93% >> > > >> > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 >> > > 55 /usr/src/sys/amd64/include/pcpu_aux.h: No such file or >> > > directory. >> > > (kgdb) bt >> > > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 >> > > #1 doadump (textdump=1) at ../../../kern/kern_shutdown.c:371 >> > > #2 0xffffffff80b83ada in kern_reboot (howto=260) >> > > at ../../../kern/kern_shutdown.c:451 >> > > #3 0xffffffff80b83f33 in vpanic (fmt=, >> > > ap=) >> > > at ../../../kern/kern_shutdown.c:880 >> > > #4 0xffffffff80b83d53 in panic (fmt=) >> > > at ../../../kern/kern_shutdown.c:807 >> > > #5 0xffffffff80be7152 in propagate_priority >> > > (td=0xfffff80196959740) >> > > at ../../../kern/subr_turnstile.c:228 >> > > #6 0xffffffff80be7d1c in turnstile_wait (ts=0xfffff8000357a780, >> > > owner=, queue=0) at >> > > ../../../kern/subr_turnstile.c:785 >> > > #7 0xffffffff80b62ca9 in __mtx_lock_sleep (c=0xfffff80003969bd0, >> > > v=) at ../../../kern/kern_mutex.c:654 >> > > #8 0xffffffff8086fcc0 in ppcintr (arg=0xfffff80003969b00) >> > > at ../../../dev/ppc/ppc.c:1546 >> > > #9 0xffffffff80b4637c in intr_event_execute_handlers (p=> > > out>, >> > > ie=0xfffff800030d9d00) at ../../../kern/kern_intr.c:1143 >> > > #10 ithread_execute_handlers (p=, >> > > ie=0xfffff800030d9d00) >> > > at ../../../kern/kern_intr.c:1156 >> > > #11 ithread_loop (arg=0xfffff800039aa7e0) at >> > > ../../../kern/kern_intr.c:1236 >> > > #12 0xffffffff80b42e1e in fork_exit ( >> > > callout=0xffffffff80b46140 , >> > > arg=0xfffff800039aa7e0, >> > > frame=0xfffffe005ab73d40) at ../../../kern/kern_fork.c:1080 >> > > #13 >> > > (kgdb) >> > > >> >> >> --000000000000ddba6305cae79aef--