Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))
Date: Mon, 07 Mar 2022 21:42:54 UTC
On Mon, Mar 07, 2022 at 09:54:26PM +0100, Ronald Klop wrote: > > Van: Mark Johnston <markj@freebsd.org> > Datum: maandag, 7 maart 2022 16:13 > Aan: Ronald Klop <ronald-lists@klop.ws> > CC: bob prohaska <fbsd@www.zefox.net>, Mark Millard <marklmi@yahoo.com>, freebsd-arm@freebsd.org, freebsd-current <freebsd-current@freebsd.org> > > I haven't been able to reproduce any crashes running poudriere in an > > arm64 AWS instance, though. Could you please try the patch below and > > confirm whether it fixes your panics? I verified that the apparent > > problem described above is gone with the patch. > > > > diff --git a/sys/kern/kern_rmlock.c b/sys/kern/kern_rmlock.c > > index 0cdcfb8fec62..e51c25136ae0 100644 > > --- a/sys/kern/kern_rmlock.c > > +++ b/sys/kern/kern_rmlock.c > > @@ -437,6 +437,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock) > > { > > struct thread *td = curthread; > > struct pcpu *pc; > > + int cpuid; > > > > if (SCHEDULER_STOPPED()) > > return (1); > > @@ -452,6 +453,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock) > > atomic_interrupt_fence(); > > > > pc = get_pcpu(); > > + cpuid = pc->pc_cpuid; > > rm_tracker_add(pc, tracker); > > sched_pin(); > > > > @@ -463,7 +465,7 @@ _rm_rlock(struct rmlock *rm, struct rm_priotracker *tracker, int trylock) > > * conditional jump. > > */ > > if (__predict_true(0 == (td->td_owepreempt | > > - CPU_ISSET(pc->pc_cpuid, &rm->rm_writecpus)))) > > + CPU_ISSET(cpuid, &rm->rm_writecpus)))) > > return (1); > > > > /* We do not have a read token and need to acquire one. */ > > > > > > > > Hi, > > This patch paniced again: > x0: ffffa00005a31500 > x1: ffffa00005a0e000 > x2: 2 > x3: ffffa00076c4e9a0 > x4: 0 > x5: e672743c8f9e5 > x6: dc89f70500ab1 > x7: 14 > x8: ffffa00005a31518 > x9: 1 > x10: ffffa00005a0e000 > x11: 0 > x12: 0 > x13: a > x14: 1013e6b85a8ecbe4 > x15: 1dce740d11a5 > x16: ffff3ea86e2434bf > x17: fffffffffffffff2 > x18: ffff0000fe661800 (g_ctx + fcf9fa54) > x19: ffffa00076c4e9a0 > x20: ffff0000fec39000 (g_ctx + fd577254) > x21: 2 > x22: ffff0000419b6090 (g_ctx + 402f42e4) > x23: ffff000000c0b137 (lockstat_enabled + 0) > x24: 100 > x25: ffff000000c0b000 (version + a0) > x26: ffff000000c0b000 (version + a0) > x27: ffff000000c0b000 (version + a0) > x28: 0 > x29: ffff0000fe661800 (g_ctx + fcf9fa54) > sp: ffff0000fe661800 > lr: ffff00000154ea50 (zio_dva_throttle + 154) > elr: ffff00000154ea80 (zio_dva_throttle + 184) > spsr: 60000045 > far: 2b753286b0b8 > panic: Unknown kernel exception 0 esr_el1 2000000 > cpuid = 1 > time = 1646685857 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x174 > panic() at panic+0x44 > do_el1h_sync() at do_el1h_sync+0x184 > handle_el1h_sync() at handle_el1h_sync+0x10 > --- exception, esr 0x2000000 > zio_dva_throttle() at zio_dva_throttle+0x184 > zio_execute() at zio_execute+0x58 > KDB: enter: panic > [ thread pid 0 tid 100129 ] > Stopped at kdb_enter+0x44: undefined f901c11f > db> ZFS doesn't make use of rm locks as far as I can see, so this is a little weird. I reverted the original rmlock commit in main, so it may be worth verifying that the problem really is gone before digging deeper. In other words, I'm a bit suspicious that this is a different bug.