Re: git: 87ee63bac69d - main - locks: add a runtime check for missing turnstile
- In reply to: Mateusz Guzik : "Re: git: 87ee63bac69d - main - locks: add a runtime check for missing turnstile"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 16 Jul 2024 20:19:04 UTC
Mateusz, On Mon, Jul 15, 2024 at 07:59:42PM +0200, Mateusz Guzik wrote: M> > > locks: add a runtime check for missing turnstile M> > > M> > > There are sometimes bugs which result in the unlock fast path failing, M> > > which in turns causes a not-helpful crash report when dereferencing a M> > > NULL turnstile. Help debugging such cases by pointing out what happened M> > > along with some debug. M> > > M> > > Sponsored by: Rubicon Communications, LLC ("Netgate") I am quite surprised to see such commit from you. You are known to cut a few instruction branches even pessimizing other people setups (yeah, talking about commits like 5091ca26507b), and now you are adding an absolutely useless branch for literally everyone. The fingerprint of the non-INVARIANTS panic for a destroyed or already unlocked mutex/rwlock is well known to anybody who has been hacking on the FreeBSD kernel for the last 20 years. I am very much surprised it is not familiar to you. The sample panic quoted below just immediately rings the bell for me, and giving feedback on this commit for many people, too. Please revert this. Here is what we have at Netflix and I would suggest you to use same practice at Netgate. If there is a panic/issue that we anticipate at our setup and we want extra debugging to be enabled without INVARIANTS, but we don't want to pollute the open source FreeBSD with our own hacks, we provided NASSERT macro in kassert.h, that works exactly same as KASSERT, but is enabled always. "N" stands for Netflix, but it would equally well work for Netgate ;) So, for a KASSERT that we want to be always enabled, we just flip one character K -> N and of course this diff is super easy to resolve during merges. M> Use-after-free, overflow, underflow, bitflip or what have you all can M> fail the fast path. M> M> Once that happens and the kernel crashes with a null pointer deref, M> here is a crash at netgate which prodded this: M> calltrap() at calltrap+0x8/frame 0xfffffe0106720920 M> --- trap 0xc, rip = 0xffffffff80d5ab70, rsp = 0xfffffe01067209f0, rbp M> = 0xfffffe0106720a00 --- M> turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0106720a00 M> __rw_wunlock_hard() at __rw_wunlock_hard+0x9e/frame 0xfffffe0106720a30 M> nd6_resolve_slow() at nd6_resolve_slow+0x2d7/frame 0xfffffe0106720aa0 M> nd6_resolve() at nd6_resolve+0x125/frame 0xfffffe0106720b10 M> ether_output() at ether_output+0x4e7/frame 0xfffffe0106720ba0 M> ip_output_send() at ip_output_send+0xdc/frame 0xfffffe0106720be0 M> ip_output() at ip_output+0x1295/frame 0xfffffe0106720ce0 M> ip_forward() at ip_forward+0x3c2/frame 0xfffffe0106720d90 M> ip_input() at ip_input+0x705/frame 0xfffffe0106720df0 M> swi_net() at swi_net+0x138/frame 0xfffffe0106720e60 M> ithread_loop() at ithread_loop+0x257/frame 0xfffffe0106720ef0 M> fork_exit() at fork_exit+0x7f/frame 0xfffffe0106720f30 M> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0106720f30 M> --- trap 0, rip = 0, rsp = 0, rbp = 0 --- M> M> Neither the register dump nor anything in the backtrace indicate what happened. M> M> Since the kernel is going down anyway, one may as well get some debug from it. -- Gleb Smirnoff