Re: git: 87ee63bac69d - main - locks: add a runtime check for missing turnstile
Date: Mon, 15 Jul 2024 20:13:17 UTC
On 7/15/24 14:40, Mateusz Guzik wrote: > On Mon, Jul 15, 2024 at 8:33 PM Mateusz Guzik <mjguzik@gmail.com> wrote: >> >> On Mon, Jul 15, 2024 at 8:21 PM John Baldwin <jhb@freebsd.org> wrote: >>> >>> On 7/15/24 13:59, Mateusz Guzik wrote: >>>> On Mon, Jul 15, 2024 at 6:22 PM John Baldwin <jhb@freebsd.org> wrote: >>>>> >>>>> On 7/11/24 07:07, Mateusz Guzik wrote: >>>>>> The branch main has been updated by mjg: >>>>>> >>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=87ee63bac69dc49291f55590b8baa57cad6c7d85 >>>>>> >>>>>> commit 87ee63bac69dc49291f55590b8baa57cad6c7d85 >>>>>> Author: Mateusz Guzik <mjg@FreeBSD.org> >>>>>> AuthorDate: 2024-07-11 00:17:27 +0000 >>>>>> Commit: Mateusz Guzik <mjg@FreeBSD.org> >>>>>> CommitDate: 2024-07-11 11:06:52 +0000 >>>>>> >>>>>> locks: add a runtime check for missing turnstile >>>>>> >>>>>> There are sometimes bugs which result in the unlock fast path failing, >>>>>> which in turns causes a not-helpful crash report when dereferencing a >>>>>> NULL turnstile. Help debugging such cases by pointing out what happened >>>>>> along with some debug. >>>>>> >>>>>> Sponsored by: Rubicon Communications, LLC ("Netgate") >>>>>> --- >>>>>> sys/kern/kern_mutex.c | 4 +++- >>>>>> sys/kern/kern_rwlock.c | 16 ++++++++++++---- >>>>>> 2 files changed, 15 insertions(+), 5 deletions(-) >>>>>> >>>>>> diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c >>>>>> index 90361b23c09a..0fa624cc4bb1 100644 >>>>>> --- a/sys/kern/kern_mutex.c >>>>>> +++ b/sys/kern/kern_mutex.c >>>>>> @@ -1053,7 +1053,9 @@ __mtx_unlock_sleep(volatile uintptr_t *c, uintptr_t v) >>>>>> turnstile_chain_lock(&m->lock_object); >>>>>> _mtx_release_lock_quick(m); >>>>>> ts = turnstile_lookup(&m->lock_object); >>>>>> - MPASS(ts != NULL); >>>>>> + if (__predict_false(ts == NULL)) { >>>>>> + panic("got NULL turnstile on mutex %p v %zx", m, v); >>>>>> + } >>>>> >>>>> Hmm, this is just an expanded KASSERT() but always on rather than conditional on INVARIANTS? >>>>> >>>>> Do you have examples of the type of bugs that cause this? (Is it unlocking a freed mutex >>>>> or the like?) We generally hide all these types of checks under INVARIANTS rather than >>>>> shipping them in release kernels. >>>>> >>>> >>>> Use-after-free, overflow, underflow, bitflip or what have you all can >>>> fail the fast path. >>>> >>>> Once that happens and the kernel crashes with a null pointer deref, >>>> here is a crash at netgate which prodded this: >>>> calltrap() at calltrap+0x8/frame 0xfffffe0106720920 >>>> --- trap 0xc, rip = 0xffffffff80d5ab70, rsp = 0xfffffe01067209f0, rbp >>>> = 0xfffffe0106720a00 --- >>>> turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0106720a00 >>>> __rw_wunlock_hard() at __rw_wunlock_hard+0x9e/frame 0xfffffe0106720a30 >>>> nd6_resolve_slow() at nd6_resolve_slow+0x2d7/frame 0xfffffe0106720aa0 >>>> nd6_resolve() at nd6_resolve+0x125/frame 0xfffffe0106720b10 >>>> ether_output() at ether_output+0x4e7/frame 0xfffffe0106720ba0 >>>> ip_output_send() at ip_output_send+0xdc/frame 0xfffffe0106720be0 >>>> ip_output() at ip_output+0x1295/frame 0xfffffe0106720ce0 >>>> ip_forward() at ip_forward+0x3c2/frame 0xfffffe0106720d90 >>>> ip_input() at ip_input+0x705/frame 0xfffffe0106720df0 >>>> swi_net() at swi_net+0x138/frame 0xfffffe0106720e60 >>>> ithread_loop() at ithread_loop+0x257/frame 0xfffffe0106720ef0 >>>> fork_exit() at fork_exit+0x7f/frame 0xfffffe0106720f30 >>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0106720f30 >>>> --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >>>> >>>> Neither the register dump nor anything in the backtrace indicate what happened. >>>> >>>> Since the kernel is going down anyway, one may as well get some debug from it. >>> >>> If you don't mind the extra branches for sanity checks, why not just run with >>> INVARIANTS? That is, what makes these particular assertions different from >>> other assertions such that they should be on unconditionally? The last line below >>> applies to pretty much every other assertion in the tree. >>> >> >> This adds a branch in the slowpath, a spot which should relatively >> rarely execute compared to the fast path. On top of that the branch at >> hand does not do any extra memory accesses or complex arithmetic. >> >> So no, I don't think I may as well run with INVARIANTS. > > How about this: if you strongly about this branch, feel free to revert > the commit, I'm just going to keep the change at Netgate. > > But should you do it, make sure to not add avoidable branches in your stuff. It's more that I think this is an unusual case (unconditional assertions rather than conditional on INVARIANTS) such that it's probably worth being a bit more explicit about that in the log with the rationale, etc. That is, I don't think "runtime check" quite communicates that you are intentionally doing an unconditional assertion, and a bit more detail about the specific types of bugs might have been useful in the log as well. I think it's fine if we want to have some checks that are always on, but it's currently quite rare so needs a bit more rationale in the log than other changes is all. -- John Baldwin