Re: aarch64 main [so: 15] panic's in kyua's sys/net/if_lagg_test:status_stress [confirmed with snapshot kernel]
Date: Tue, 12 Sep 2023 03:11:18 UTC
On Sep 11, 2023, at 19:40, Mark Millard <marklmi@yahoo.com> wrote: > On Sep 11, 2023, at 01:13, Mark Millard <marklmi@yahoo.com> wrote: > >> It will be some time before I can try this with >> an official snapshot instead of a personal build. >> The build is based on b6ce41118bb1 : >> >> # uname -apKU >> FreeBSD CA78C-WDK23-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 1500000 #17 main-n265279-b6ce41118bb1-dirty: Sun Sep 10 14:36:47 PDT 2023 root@CA78C-WDK23-ZFS:/usr/obj/BUILDs/main-CA78C-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA78C arm64 aarch64 1500000 1500000 >> >> So it was a non-debug build, although I do not >> strip symbols and such in my builds. >> >> . . . >> sys/net/if_lagg_test:create -> passed [0.105s] >> sys/net/if_lagg_test:create_destroy_stress -> skipped: Skipping this test because it easily panics the machine [0.019s] >> sys/net/if_lagg_test:lacp_linkstate_destroy_stress -> passed [60.045s] >> sys/net/if_lagg_test:set_ether -> passed [0.066s] >> sys/net/if_lagg_test:status_stress -> >> >> The core.txt.5 is not great, unfortunately: >> >> panic: vm_fault failed: 0xffff0000006b96dc error 1 >> >> GNU gdb (GDB) 13.1 [GDB v13.1 for FreeBSD] >> . . . >> Reading symbols from /boot/kernel/kernel... >> Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... >> >> Unread portion of the kernel message buffer: >> (dump_iface + 0x2c0) >> elr: 0xffff0000006b96dc (dump_sa + 0x1c) >> spsr: 0x0000000000400045 >> far: 0x44572d4338374144 >> esr: 0x0000000096000004 >> panic: vm_fault failed: 0xffff0000006b96dc error 1 >> cpuid = 2 >> time = 1694414226 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> vpanic() at vpanic+0x1a0 >> panic() at panic+0x44 >> data_abort() at data_abort+0x304 >> handle_el1h_sync() at handle_el1h_sync+0x14 >> --- exception, esr 0x96000004 >> dump_sa() at dump_sa+0x1c >> dump_iface() at dump_iface+0x2bc >> dump_cb() at dump_cb+0x18 >> if_foreach_sleep() at if_foreach_sleep+0x244 >> rtnl_handle_getlink() at rtnl_handle_getlink+0xec >> rtnl_handle_message() at rtnl_handle_message+0x19c >> nl_taskqueue_handler() at nl_taskqueue_handler+0x674 >> taskqueue_run_locked() at taskqueue_run_locked+0x194 >> taskqueue_thread_loop() at taskqueue_thread_loop+0xcc >> fork_exit() at fork_exit+0x88 >> fork_trampoline() at fork_trampoline+0x14 >> KDB: enter: panic >> >> get_curthread () at /usr/main-src/sys/arm64/include/pcpu.h:77 >> 77 __asm __volatile("ldr %0, [x18]" : "=&r"(td)); >> (kgdb) #0 get_curthread () at /usr/main-src/sys/arm64/include/pcpu.h:77 >> #1 doadump (textdump=0, textdump@entry=4003518992) >> at /usr/main-src/sys/kern/kern_shutdown.c:405 >> #2 0xffff0000000f7704 in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) >> at /usr/main-src/sys/ddb/db_command.c:591 >> #3 0xffff0000000f74e0 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) >> at /usr/main-src/sys/ddb/db_command.c:504 >> #4 0xffff0000000f71b8 in db_command_loop () >> at /usr/main-src/sys/ddb/db_command.c:551 >> #5 0xffff0000000fad9c in db_trap (type=<optimized out>, code=<optimized out>) >> at /usr/main-src/sys/ddb/db_main.c:268 >> #6 0xffff0000004f4ec4 in kdb_trap (type=60, code=0, tf=<optimized out>) >> at /usr/main-src/sys/kern/subr_kdb.c:790 >> #7 <signal handler called> >> #8 <signal handler called> >> #9 <signal handler called> >> #10 <signal handler called> >> #11 <signal handler called> >> #12 <signal handler called> >> #13 <signal handler called> >> #14 <signal handler called> >> #15 <signal handler called> >> #16 <signal handler called> >> #17 <signal handler called> >> #18 <signal handler called> >> #19 <signal handler called> >> #20 <signal handler called> >> #21 <signal handler called> >> #22 <signal handler called> >> Backtrace stopped: Cannot access memory at address 0x10 >> (kgdb) >> >> >> So some transcribing of a picture in order to >> show register values that were reported: >> >> Fatal data abort: >> x0: 0xffff000leea0e7f0 (_DYNAMIC * 0x6d816648) >> x1: 0x0000000000000001 >> x2: 0x44572d4338374143 >> x3: 0xffff0000005d3f90 (ifdead_ioctl + 0x0) >> x4: 0xffffa00b7f0d185e >> x5: 0xffffa0023fe4b992 >> x6: 0x000000006767616c >> x7: 0x00706174016f7575 >> x8: 0x00000000000001a4 >> x9: 0x0000000000210005 >> x10: 0×0000000000000800 >> x11: 0xfefefefefefefeff >> x12: 0x0000000000000008 >> x13: 0x0000000000000000 >> x14: 0x00000000000000ff >> x15: 0x0000000000000700 >> x16: 0x0000000000000008 >> x17: 0x0000000000000007 >> x18: 0xffff0001eea0e500 (_DYNAMIC + 0x6d816358) >> x19: 0xffff000leea0e7f0 (_DYNAMIC * 0x6d816648) >> x20: 0xffffa00b7f0d1800 >> x21: 0xffffa00b7f0d1858 >> x22: 0x000000000000000c >> x23: 0X0000000000000005 >> x24: 0×0000000000000000 >> x25: 0xffff000000c68000 (sysctl___kern_features_netlink + 0x10) >> x26: 0x0000000000000000 >> x27: 0xffff000000ce9000 (cap_linkat_source_rights + 0x8) >> x28: 0xffff0000006bb0a0 (dump_cb + 0x0) >> x29: 0xffff0001eea0e520 (_DYNAMIC + 0x6d816378) >> sp: 0xffff0001eea0e500 >> lr: 0xffff0000006b8fe0 (dump_iface + 0x2c0) >> elr: 0xffff0000006b96dc (dump_sa + 0x1c) >> spsr: 0x0000000000400045 >> far: 0x44572d4338374144 >> esr: 0x0000000096000004 >> panic: m_fault failed: 0xffff0000006b96dc error 1 >> >> I expect that this is similar to reports I'd made >> back in 14.0-CURRENT days. As I remember, snapshot >> builds of the time also got the panic. >> >> I will note that an earlier 14.0-BETA1 snapshot >> kernel test run did not panic at this point in the >> sequence (or at any point). But I do not know how >> repeatable the panics are in the various contexts. >> >> I'll note that I've tried to have the various ports >> installed (poudriere built) that are listed at: >> >> https://github.com/freebsd/freebsd-ci/blob/master/scripts/build/build-test_image-head.sh#L69-L84 >> >> (The ones that build for aarch64, anyway.) >> >> I had in /etc/kyua/kyua.conf : >> >> test_suites.FreeBSD.disks = '/dev/md0 /dev/md1 /dev/md2 /dev/md3 /dev/md4 /dev/md5' >> >> and used: >> >> # more ~/prekyua-aarch64-mdconfig.sh >> #! /bin/sh >> truncate -s 4g /var/tmp/for-md0.dat >> truncate -s 4g /var/tmp/for-md1.dat >> truncate -s 4g /var/tmp/for-md2.dat >> truncate -s 4g /var/tmp/for-md3.dat >> truncate -s 4g /var/tmp/for-md4.dat >> truncate -s 4g /var/tmp/for-md5.dat >> mdconfig -f /var/tmp/for-md0.dat -u md0 >> mdconfig -f /var/tmp/for-md1.dat -u md1 >> mdconfig -f /var/tmp/for-md2.dat -u md2 >> mdconfig -f /var/tmp/for-md3.dat -u md3 >> mdconfig -f /var/tmp/for-md4.dat -u md4 >> mdconfig -f /var/tmp/for-md5.dat -u md5 >> >> I also did a: >> >> # kldload linux64 >> >> before doing: >> >> # /usr/bin/kyua test -k /usr/tests/Kyuafile >> >> (Not true of linux64.ko in 14.0-CURRENT days.) > > # uname -apKU > FreeBSD CA78C-WDK23-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT aarch64 1500000 #0 main-n265205-03a7c36ddbc0: Thu Sep 7 03:05:31 UTC 2023 root@releng3.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 aarch64 1500000 1500000 > > # /usr/bin/kyua test -k /usr/tests/Kyuafile sys/net/if_lagg_test:status_stress > sys/net/if_lagg_test:status_stress -> > > got: > > panic: vm_fault failed: 0xffff0000006813b4 error 1 > > GNU gdb (GDB) 13.1 [GDB v13.1 for FreeBSD] > . . . > Reading symbols from /boot/kernel/kernel... > Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... > > Unread portion of the kernel message buffer: > <6>ue0: 3 link states coalesced > <6>ue0: link state changed to UP > <6>lagg0: link state changed to DOWN > <6>ue0: link state changed to DOWN > Fatal data abort: > x0: 0xffff00015df8d800 (infiniband_input.printedonce + 0x11eff68) > x1: 0x0000000000000001 > x2: 0xdeadc0dedeadc0de > x3: 0xffff000000593e34 (ifdead_ioctl + 0x0) > x4: 0xffffa0004fb6285e > x5: 0xffffa0004fc00192 > x6: 0x000000006767616c > x7: 0x6e6d760070617401 > x8: 0x00000000000001a4 > x9: 0xffffa0004fc00000 > x10: 0x0000000000210005 > x11: 0x000000007ffffffe > x12: 0x0000000000000008 > x13: 0x0000000000000000 > x14: 0x0000000000010000 > x15: 0x0000000000000001 > x16: 0x0000000000010000 > x17: 0x0000000000000007 > x18: 0xffff00015df8d500 > <6>ue0: link state changed to UP > (infiniband_input.printedonce + 0x11efc68) > x19: 0xffff00015df8d800 (infiniband_input.printedonce + 0x11eff68) > x20: 0xffffa0004fb62800 > x21: 0xffffa0004fb62858 > x22: 0x000000000000000c > x23: 0x0000000000000005 > x24: 0x0000000000000000 > x25: 0xffff000000c58000 (sysctl___net_netlink_debug + 0x40) > x26: 0x0000000000000000 > x27: 0xffff000000cd9000 (sdt_vfs_vop_vop_spare5_return + 0x10) > x28: 0xffff000000cd9000 (sdt_vfs_vop_vop_spare5_return + 0x10) > x29: 0xffff00015df8d520 (infiniband_input.printedonce + 0x11efc88) > sp: 0xffff00015df8d500 > lr: 0xffff000000680cbc (dump_iface + 0x2c0) > elr: 0xffff0000006813b4 (dump_sa + 0x1c) > spsr: 0x0000000000400045 > far: 0xdeadc0dedeadc0df > esr: 0x0000000096000004 > panic: vm_fault failed: 0xffff0000006813b4 error 1 > cpuid = 3 > time = 1694485392 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x19c > panic() at panic+0x44 > data_abort() at data_abort+0x35c > handle_el1h_sync() at handle_el1h_sync+0x14 > --- exception, esr 0x96000004 > dump_sa() at dump_sa+0x1c > dump_iface() at dump_iface+0x2bc > dump_cb() at dump_cb+0x18 > if_foreach_sleep() at if_foreach_sleep+0x254 > rtnl_handle_getlink() at rtnl_handle_getlink+0xec > rtnl_handle_message() at rtnl_handle_message+0x19c > nl_taskqueue_handler() at nl_taskqueue_handler+0x5dc > taskqueue_run_locked() at taskqueue_run_locked+0x17c > taskqueue_thread_loop() at taskqueue_thread_loop+0xc8 > fork_exit() at fork_exit+0x74 > fork_trampoline() at fork_trampoline+0x14 > KDB: enter: panic > > get_curthread () at /usr/src/sys/arm64/include/pcpu.h:77 > 77 __asm __volatile("ldr %0, [x18]" : "=&r"(td)); > (kgdb) #0 get_curthread () at /usr/src/sys/arm64/include/pcpu.h:77 > #1 doadump (textdump=0, textdump@entry=1576585744) > at /usr/src/sys/kern/kern_shutdown.c:405 > #2 0xffff0000000ec18c in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) > at /usr/src/sys/ddb/db_command.c:591 > #3 0xffff0000000ebf88 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) > at /usr/src/sys/ddb/db_command.c:504 > #4 0xffff0000000ebc80 in db_command_loop () > at /usr/src/sys/ddb/db_command.c:551 > #5 0xffff0000000ef440 in db_trap (type=<optimized out>, code=<optimized out>) > at /usr/src/sys/ddb/db_main.c:268 > #6 0xffff0000004b4860 in kdb_trap (type=60, code=0, tf=<optimized out>) > at /usr/src/sys/kern/subr_kdb.c:790 > #7 <signal handler called> > #8 <signal handler called> > #9 <signal handler called> > #10 <signal handler called> > #11 <signal handler called> > #12 <signal handler called> > #13 <signal handler called> > #14 <signal handler called> > #15 <signal handler called> > #16 <signal handler called> > #17 <signal handler called> > #18 <signal handler called> > #19 <signal handler called> > #20 <signal handler called> > #21 <signal handler called> > #22 <signal handler called> > #23 <signal handler called> > Backtrace stopped: Cannot access memory at address 0x10 > (kgdb) > > (Again, kgdb's stack frames #7 and larger are not particularly > useful.) > > Possibly interesting are the slightly different values: > > x2: 0xdeadc0dedeadc0de > and: > far: 0xdeadc0dedeadc0df > So, I again tried the 14.0-BETA1 snapshot: # uname -apKU FreeBSD generic 14.0-BETA1 FreeBSD 14.0-BETA1 aarch64 1400097 #0 releng/14.0-n265060-4e027ca1514f: Fri Sep 8 11:17:15 UTC 2023 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 aarch64 1400097 1400097 and again it did not panic: # /usr/bin/kyua test -k /usr/tests/Kyuafile sys/net/if_lagg_test:status_stress sys/net/if_lagg_test:status_stress -> passed [60.111s] Results file id is usr_tests.20230909-084231-927014 Results saved to /root/.kyua/store/results.usr_tests.20230909-084231-927014.db 1/1 passed (0 failed) The problem seems specific in some way to main [so: 15 at this point]. Given that my personal non-debug builds of main [so: 15] get a panic and the debug build in the snapshot does as well, it likely is not a debug vs. non-debug issue. (Although, I do not strip symbols or such in my builds.) === Mark Millard marklmi at yahoo.com