From nobody Tue Mar 08 15:42:04 2022 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id CBF201A1436A; Tue, 8 Mar 2022 15:42:00 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (www.zefox.net [50.1.20.27]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "www.zefox.com", Issuer "www.zefox.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KCfhR2ymrz4c3D; Tue, 8 Mar 2022 15:41:59 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (localhost [127.0.0.1]) by www.zefox.net (8.16.1/8.15.2) with ESMTPS id 228Fg6vo037302 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 8 Mar 2022 07:42:06 -0800 (PST) (envelope-from fbsd@www.zefox.net) Received: (from fbsd@localhost) by www.zefox.net (8.16.1/8.15.2/Submit) id 228Fg5Cc037301; Tue, 8 Mar 2022 07:42:05 -0800 (PST) (envelope-from fbsd) Date: Tue, 8 Mar 2022 07:42:04 -0800 From: bob prohaska To: Mark Johnston Cc: Andrew Turner , Ronald Klop , Mark Millard , freebsd-arm@freebsd.org, freebsd-current , bob prohaska Subject: Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28)) Message-ID: <20220308154204.GA37265@www.zefox.net> References: <1800459695.1.1646649539521@mailrelay> <132978150.92.1646660769467@mailrelay> <3374E0F8-D712-4ED0-A62B-B6924FC8A5E2@fubar.geek.nz> List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4KCfhR2ymrz4c3D X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of fbsd@www.zefox.net has no SPF policy when checking 50.1.20.27) smtp.mailfrom=fbsd@www.zefox.net X-Spamd-Result: default: False [-0.92 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; WWW_DOT_DOMAIN(0.50)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-0.94)[-0.944]; NEURAL_HAM_LONG(-0.88)[-0.878]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[zefox.net]; AUTH_NA(1.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_SEVEN(0.00)[7]; MLMMJ_DEST(0.00)[freebsd-current,freebsd-arm]; R_SPF_NA(0.00)[no SPF record]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:7065, ipnet:50.1.16.0/20, country:US]; FREEMAIL_CC(0.00)[fubar.geek.nz,klop.ws,yahoo.com,freebsd.org,www.zefox.net]; MID_RHS_WWW(0.50)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On Mon, Mar 07, 2022 at 11:45:02AM -0500, Mark Johnston wrote: > On Mon, Mar 07, 2022 at 04:25:22PM +0000, Andrew Turner wrote: > > > > > On 7 Mar 2022, at 15:13, Mark Johnston wrote: > > > ... > > > A (the?) problem is that the compiler is treating "pc" as an alias > > > for x18, but the rmlock code assumes that the pcpu pointer is loaded > > > once, as it dereferences "pc" outside of the critical section. On > > > arm64, if a context switch occurs between the store at _rm_rlock+144 and > > > the load at +152, and the thread is migrated to another CPU, then we'll > > > end up using the wrong CPU ID in the rm->rm_writecpus test. > > > > > > I suspect the problem is unique to arm64 as its get_pcpu() > > > implementation is different from the others in that it doesn't use > > > volatile-qualified inline assembly. This has been the case since > > > https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762 > > > . > > > > > > I haven't been able to reproduce any crashes running poudriere in an > > > arm64 AWS instance, though. Could you please try the patch below and > > > confirm whether it fixes your panics? I verified that the apparent > > > problem described above is gone with the patch. > > > > Alternatively (or additionally) we could do something like the following. There are only a few MI users of get_pcpu with the main place being in rm locks. > > > > diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h > > index 09f6361c651c..59b890e5c2ea 100644 > > --- a/sys/arm64/include/pcpu.h > > +++ b/sys/arm64/include/pcpu.h > > @@ -58,7 +58,14 @@ struct pcpu; > > > > register struct pcpu *pcpup __asm ("x18"); > > > > -#define get_pcpu() pcpup > > +static inline struct pcpu * > > +get_pcpu(void) > > +{ > > + struct pcpu *pcpu; > > + > > + __asm __volatile("mov %0, x18" : "=&r"(pcpu)); > > + return (pcpu); > > +} > > > > static inline struct thread * > > get_curthread(void) > > Indeed, I think this is probably the best solution. Just for fun I tried the patch on a Pi3 running -current, updated a day or two prior. The patch applied, compiled and seemed to run acceptably, but when I left a -j2 -DWITH_META_MODE buildworld running it crashed overnight, reporting login: panic: rm_rlock: recursed on non-recursive rmlock sysctl lock @ /usr/src/sys/kern/kern_sysctl.c:193 cpuid = 0 time = 1646720264 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x174 panic() at panic+0x44 _rm_rlock_debug() at _rm_rlock_debug+0x214 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x140 sysctl_root() at sysctl_root+0x1ac userland_sysctl() at userland_sysctl+0x140 sys___sysctl() at sys___sysctl+0x68 do_el0_sync() at do_el0_sync+0x520 handle_el0_sync() at handle_el0_sync+0x40 --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 869 tid 100091 ] Stopped at kdb_enter+0x44: undefined f902011f I tried typing bt at the debugger prompt but got no more output. I've put the buildworld log file at http://www.zefox.net/~fbsd/rpi3/crashes/20220307/ Hope this is of some use.... bob prohaska