Re: git: 87ee63bac69d - main - locks: add a runtime check for missing turnstile
- Reply: John Baldwin : "Re: git: 87ee63bac69d - main - locks: add a runtime check for missing turnstile"
- Reply: Gleb Smirnoff : "Re: git: 87ee63bac69d - main - locks: add a runtime check for missing turnstile"
- In reply to: John Baldwin : "Re: git: 87ee63bac69d - main - locks: add a runtime check for missing turnstile"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 15 Jul 2024 17:59:42 UTC
On Mon, Jul 15, 2024 at 6:22 PM John Baldwin <jhb@freebsd.org> wrote: > > On 7/11/24 07:07, Mateusz Guzik wrote: > > The branch main has been updated by mjg: > > > > URL: https://cgit.FreeBSD.org/src/commit/?id=87ee63bac69dc49291f55590b8baa57cad6c7d85 > > > > commit 87ee63bac69dc49291f55590b8baa57cad6c7d85 > > Author: Mateusz Guzik <mjg@FreeBSD.org> > > AuthorDate: 2024-07-11 00:17:27 +0000 > > Commit: Mateusz Guzik <mjg@FreeBSD.org> > > CommitDate: 2024-07-11 11:06:52 +0000 > > > > locks: add a runtime check for missing turnstile > > > > There are sometimes bugs which result in the unlock fast path failing, > > which in turns causes a not-helpful crash report when dereferencing a > > NULL turnstile. Help debugging such cases by pointing out what happened > > along with some debug. > > > > Sponsored by: Rubicon Communications, LLC ("Netgate") > > --- > > sys/kern/kern_mutex.c | 4 +++- > > sys/kern/kern_rwlock.c | 16 ++++++++++++---- > > 2 files changed, 15 insertions(+), 5 deletions(-) > > > > diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c > > index 90361b23c09a..0fa624cc4bb1 100644 > > --- a/sys/kern/kern_mutex.c > > +++ b/sys/kern/kern_mutex.c > > @@ -1053,7 +1053,9 @@ __mtx_unlock_sleep(volatile uintptr_t *c, uintptr_t v) > > turnstile_chain_lock(&m->lock_object); > > _mtx_release_lock_quick(m); > > ts = turnstile_lookup(&m->lock_object); > > - MPASS(ts != NULL); > > + if (__predict_false(ts == NULL)) { > > + panic("got NULL turnstile on mutex %p v %zx", m, v); > > + } > > Hmm, this is just an expanded KASSERT() but always on rather than conditional on INVARIANTS? > > Do you have examples of the type of bugs that cause this? (Is it unlocking a freed mutex > or the like?) We generally hide all these types of checks under INVARIANTS rather than > shipping them in release kernels. > Use-after-free, overflow, underflow, bitflip or what have you all can fail the fast path. Once that happens and the kernel crashes with a null pointer deref, here is a crash at netgate which prodded this: calltrap() at calltrap+0x8/frame 0xfffffe0106720920 --- trap 0xc, rip = 0xffffffff80d5ab70, rsp = 0xfffffe01067209f0, rbp = 0xfffffe0106720a00 --- turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0106720a00 __rw_wunlock_hard() at __rw_wunlock_hard+0x9e/frame 0xfffffe0106720a30 nd6_resolve_slow() at nd6_resolve_slow+0x2d7/frame 0xfffffe0106720aa0 nd6_resolve() at nd6_resolve+0x125/frame 0xfffffe0106720b10 ether_output() at ether_output+0x4e7/frame 0xfffffe0106720ba0 ip_output_send() at ip_output_send+0xdc/frame 0xfffffe0106720be0 ip_output() at ip_output+0x1295/frame 0xfffffe0106720ce0 ip_forward() at ip_forward+0x3c2/frame 0xfffffe0106720d90 ip_input() at ip_input+0x705/frame 0xfffffe0106720df0 swi_net() at swi_net+0x138/frame 0xfffffe0106720e60 ithread_loop() at ithread_loop+0x257/frame 0xfffffe0106720ef0 fork_exit() at fork_exit+0x7f/frame 0xfffffe0106720f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0106720f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Neither the register dump nor anything in the backtrace indicate what happened. Since the kernel is going down anyway, one may as well get some debug from it. -- Mateusz Guzik <mjguzik gmail.com>