Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))
- Reply: Mark Millard : "Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))"
- Reply: bob prohaska : "Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))"
- In reply to: Andrew Turner : "Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 07 Mar 2022 16:45:02 UTC
On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner wrote: > > > On 7 Mar 2022, at 15:13, Mark Johnston <markj@freebsd.org> wrote: > > ... > > A (the?) problem is that the compiler is treating "pc" as an alias > > for x18, but the rmlock code assumes that the pcpu pointer is loaded > > once, as it dereferences "pc" outside of the critical section. On > > arm64, if a context switch occurs between the store at _rm_rlock+144 and > > the load at +152, and the thread is migrated to another CPU, then we'll > > end up using the wrong CPU ID in the rm->rm_writecpus test. > > > > I suspect the problem is unique to arm64 as its get_pcpu() > > implementation is different from the others in that it doesn't use > > volatile-qualified inline assembly. This has been the case since > > https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762 <https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762> > > . > > > > I haven't been able to reproduce any crashes running poudriere in an > > arm64 AWS instance, though. Could you please try the patch below and > > confirm whether it fixes your panics? I verified that the apparent > > problem described above is gone with the patch. > > Alternatively (or additionally) we could do something like the following. There are only a few MI users of get_pcpu with the main place being in rm locks. > > diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h > index 09f6361c651c..59b890e5c2ea 100644 > --- a/sys/arm64/include/pcpu.h > +++ b/sys/arm64/include/pcpu.h > @@ -58,7 +58,14 @@ struct pcpu; > > register struct pcpu *pcpup __asm ("x18"); > > -#define get_pcpu() pcpup > +static inline struct pcpu * > +get_pcpu(void) > +{ > + struct pcpu *pcpu; > + > + __asm __volatile("mov %0, x18" : "=&r"(pcpu)); > + return (pcpu); > +} > > static inline struct thread * > get_curthread(void) Indeed, I think this is probably the best solution.