Re: aarch64 (not armv7) kyua run on main [so: 14]: sys/net/if_lagg_test:status_stress got "Fatal data abort" panic [14.0-ALPHA1 snapshot panic submitted to bugzilla]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 12 Aug 2023 06:28:34 UTC
On Aug 9, 2023, at 22:30, Mark Millard <marklmi@yahoo.com> wrote:

> The context is on a Windows Dev Kit 2023, using a bectl based boot/root disk:
> 
> # uname -apKU
> FreeBSD CA78C-WDK23-ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT aarch64 1400094 #9 main-n264643-0befc55cdf4b-dirty: Wed Aug  9 14:23:48 PDT 2023     root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA78C arm64 aarch64 1400094 1400094
> 
> I only do bugzailla submittals based on just my
> own builds as a means of last resort: I try to
> use official builds for such. The:
> 
> main-n264491-8a5c836b51ce: Thu Aug  3
> 
> snapshot did not panic on the RPi4B that I
> tried it with. We will see for the next
> snapshot at some point.
> 
> Both the non-debug and the debug kernels panic.
> I saw no evidence of the debug kernel reporting
> anything.
> 
> Note the:
> 
> 0xdeadc0dedeadc0de (2 examples)
> and:
> 0xfefefefefefefeff (1 example)
> 
> that may have some significance.
> 
> . . .
> sys/net/if_gif:basic  ->  passed  [0.195s]
> sys/net/if_lagg_test:create  ->  passed  [0.125s]
> sys/net/if_lagg_test:create_destroy_stress  ->  skipped: Skipping this test because it easily panics the machine  [0.022s]
> sys/net/if_lagg_test:lacp_linkstate_destroy_stress  ->  passed  [60.048s]
> sys/net/if_lagg_test:set_ether  ->  passed  [0.090s]
> sys/net/if_lagg_test:status_stress  ->  
> 
> <6>lagg0: link state changed to DOWN
> Fatal data abort:
>  x0: 0xffff000186c82858 (_DYNAMIC + 0x271e46b8)
>  x1: 0x0000000000000001
>  x2: 0xdeadc0dedeadc0de
>  x3: 0xffff0000005b68c0 (ifdead_ioctl + 0x0)
>  x4: 0xffffa000a8ba305e
>  x5: 0xffffa00023d932fa
>  x6: 0x000000006767616c
>  x7: 0x6e6d760070617401
>  x8: 0x000000000000030c
>  x9: 0x0000000000210005
> x10: 0x0000000000000800
> x11: 0xfefefefefefefeff
> x12: 0x0000000000000008
> x13: 0x0000000000000000
> x14: 0x0000000000010000
> x15: 0x0000000000000001
> x16: 0x0000000000010000
> x17: 0x0000000000000007
> x18: 0xffff000186c82520
> <6>ue0: link state changed to DOWN
> (_DYNAMIC + 0x271e4380)
> x19: 0xffff000186c82858 (_DYNAMIC + 0x271e46b8)
> x20: 0xffffa000a8ba3000
> x21: 0xffffa000a8ba3058
> x22: 0x000000000000000c
> x23: 0x0000000000000005
> x24: 0x0000000000000000
> x25: 0xffff000000c7a000 (keysw + 0xb8)
> x26: 0x0000000000000000
> x27: 0xffff000000cf9000 (sdta_vfs_vop_vop_spare4_return1 + 0x18)
> x28: 0x0000000000000008
> x29: 0xffff000186c82540 (_DYNAMIC + 0x271e43a0)
>  sp: 0xffff000186c82520
>  lr: 0xffff0000006a0b50 (dump_iface + 0x2c0)
> elr: 0xffff0000006a124c (dump_sa + 0x1c)
> spsr: 0x0000000000400045
> far: 0xdeadc0dedeadc0df
> esr: 0x0000000096000004
> timeout stopping cpus
> panic: vm_fault failed: 0xffff0000006a124c error 1
> cpuid = 2
> time = 1691642123
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> data_abort() at data_abort+0x358
> handle_el1h_sync() at handle_el1h_sync+0x14
> --- exception, esr 0x96000004
> dump_sa() at dump_sa+0x1c
> dump_iface() at dump_iface+0x2bc
> dump_cb() at dump_cb+0x18
> if_foreach_sleep() at if_foreach_sleep+0x208
> rtnl_handle_getlink() at rtnl_handle_getlink+0xec
> rtnl_handle_message() at rtnl_handle_message+0x19c
> nl_taskqueue_handler() at nl_taskqueue_handler+0x5f4
> taskqueue_run_locked() at taskqueue_run_locked+0x1a4
> taskqueue_thread_loop() at taskqueue_thread_loop+0xc8
> fork_exit() at fork_exit+0x74
> fork_trampoline() at fork_trampoline+0x14
> 
> This was from:
> 
> # /usr/bin/kyua test -k /usr/tests/Kyuafile
> 
> But the earlier part of the run is not
> needed to get the panic. Booting, logging
> in as root, and doing:
> 
> # /usr/bin/kyua test -k /usr/tests/Kyuafile sys/net/if_lagg_test:status_stress
> 
> is sufficient to get the panic in my context.
> 
> 
> For reference for the RPi4B not getting the panic:
> 
> Trying on an RPi4B with a somewhat older snapshot did not panic:
> 
> # uname -apKU
> you have mail
> FreeBSD generic 14.0-CURRENT FreeBSD 14.0-CURRENT aarch64 1400093 #0 main-n264491-8a5c836b51ce: Thu Aug  3 12:10:50 UTC 2023     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 aarch64 1400093 1400093
> 
> # /usr/bin/kyua test -k /usr/tests/Kyuafile sys/net/if_lagg_test:status_stress
> sys/net/if_lagg_test:status_stress  ->  passed  [60.371s]
> 
> Results file id is usr_tests.20230804-151402-553517
> Results saved to /root/.kyua/store/results.usr_tests.20230804-151402-553517.db
> 
> 1/1 passed (0 failed)

I replicated the panic via an 14.0-ALPHA1 snapshot dd'd to
USB media then used to boot and operate a Windows Dev Kit
2023. See:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273081


===
Mark Millard
marklmi at yahoo.com