Re: Kernel panic on armv7 when PF is enabled

From: Marek Zarychta <zarychtam_at_plan-b.pwste.edu.pl>
Date: Tue, 03 May 2022 14:09:35 UTC
W dniu 2.05.2022 o 11:02, Kristof Provost pisze:
> On 1 May 2022, at 5:13, qroxana wrote:
> 
>     After git bisecting the panic started since this commit.
> 
>     commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113
> 
>     Author: Kristof Provost <
>     kp@FreeBSD.org
> 
>     Date: Mon Feb 14 20:09:54 2022 +0100
> 
>     vlan: allow net.link.vlan.mtag_pcp to be set per vnet
> 
>     The primary reason for this change is to facilitate testing.
> 
>     MFC after: 1 week
> 
>     sys/net/if_ethersubr.c | 9 +++++----
> 
>     sys/net/if_vlan.c | 5 +++--
> 
>     2 files changed, 8 insertions(+), 6 deletions(-)
> 
>     The armv7 board boots from a NFS root,
> 
>     it can boot without any problem if PF is disabled.
> 
>     Any helps?
> 
>     add host ::1: gateway lo0 fib 0: route already in table
>     add net fe80::: gateway ::1
>     add net ff02::: gateway ::1
>     add net ::ffff:0.0.0.0: gateway ::1
>     add net ::0.0.0.0: gateway ::1
>     Enabling pf.
>     Kernel page fault with the following non-sleepable locks held:
>     shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @
>     /usr/src/sys/netpfil/pf/pf.c:6493
>     exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @
>     /usr/src/sys/netinet/tcp_usrreq.c:1008
>     stack backtrace:
>     #0 0xc0355cac at witness_debugger+0x7c
>     #1 0xc0356ef0 at witness_warn+0x3fc
>     #2 0xc05ec048 at abort_handler+0x1d8
>     #3 0xc05cb5ac at exception_exit+0
>     #4 0xe3083c10 at pf_syncookie_validate+0x60
>     #5 0xe30496a8 at pf_test+0x518
>     #6 0xe306d768 at pf_check_out+0x30
>     #7 0xc0415b44 at pfil_run_hooks+0xbc
>     #8 0xc0445cfc at ip_output+0xce8
>     #9 0xc045bc9c at tcp_default_output+0x20ac
>     #10 0xc0471eb4 at tcp_usr_send+0x1ac
>     #11 0xc0389464 at sosend_generic+0x490
>     #12 0xc0389790 at sosend+0x64
>     #13 0xc0502888 at clnt_vc_call+0x560
>     #14 0xc05009d8 at clnt_reconnect_call+0x170
>     #15 0xc01e7b14 at newnfs_request+0xb20
>     #16 0xc0230218 at nfscl_request+0x60
>     #17 0xc020d9bc at nfsrpc_getattr+0xb0
>     Fatal kernel mode data abort: 'Alignment Fault' on read
>     trapframe: 0xdf1f1c90
>     FSR=00000001, FAR=d7840264, spsr=40000013
>     r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
>     r4 =df1f1e00, r5 =dac0d75f, r6 =00000018, r7 =d9422c00
>     r8 =c093e5e4, r9 =00000001, r10=df1f1f5c, r11=df1f1d38
>     r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10
> 
> 
> The commit you point at is entirely unrelated to the code where the 
> panic occurred, so I’m pretty sure something went wrong in your bisect.

I was experiencing this panic also running FreeBSD 14.0-CURRENT #1 
main-n253028-2ec9a427c85: Tue Feb  8 17:49:25 CET 2022 on armv7, so it's 
unrelated to aforementioned commit which is dated 2022-02-14.

It's very interesting and weird bug, loading pf.ko, enabling PF, loading 
the rules work as expected, but processing the filtered traffic by PF 
triggers the panic.

> 
> The backtrace would suggest the issue occurs in the 
> pf_syncookie_validate() function, and likely in the line |if 
> (atomic_load_64(&V_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
> == 0)|
> 
> The obvious way for that to panic would be to call it without the 
> curvnet context set, but pf_test() uses it earlier, so that’s going to 
> be fine.
> 
> Given that this is unique to armv7 I’d recommend talking to the armv7 
> maintainer about 64 bit atomic operations.
> 
> You can probably avoid the atomic load with this patch (and not enabling 
> syncookie support):
> 
> |diff --git a/sys/netpfil/pf/pf_syncookies.c 
> b/sys/netpfil/pf/pf_syncookies.c index 5230502be30c..c86d469d3cef 100644 
> --- a/sys/netpfil/pf/pf_syncookies.c +++ 
> b/sys/netpfil/pf/pf_syncookies.c @@ -313,6 +313,9 @@ 
> pf_syncookie_validate(struct pf_pdesc *pd) ack = 
> ntohl(pd->hdr.tcp.th_ack) - 1; cookie.cookie = (ack & 0xff) ^ (ack >> 
> 24); + if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER) + return 
> (0); + /* we don't know oddeven before setting the cookie (union) */ if 
> (atomic_load_64(&V_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
> == 0) |
> 
> That shouldn’t be required though.
> 
> Br,
> Kristof
> 


-- 
Marek Zarychta