kernel: fatal trap 12 on CURRENT, when using WireGuard
- Reply: Gleb Smirnoff : "Re: kernel: fatal trap 12 on CURRENT, when using WireGuard"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 09 Jan 2024 20:23:54 UTC
I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a very recent commit. The build and install went fine. After booting with new base, I got a page fault with the following error: Kernel page fault with the following non-sleepable locks held: shared rm netlink lock (netlink lock) r = 0 (0xfffff8005fc8ca20) locked @ /usr/src/sys/netlink/netlink_domain.c:241 exclusive rw lle (lle) r = 0 (0xfffff801951dce90) locked @ /usr/src/sys/netinet/in.c:1716 stack backtrace: #0 0xffffffff80bc6c45 at witness_debugger+0x65 #1 0xffffffff80bc7d89 at witness_warn+0x3e9 #2 0xffffffff81056b18 at trap_pfault+0x88 #3 0xffffffff81028708 at calltrap+0x8 #4 0xffffffff80dbd6a2 at nl_send_group+0x1d2 #5 0xffffffff80dc0e27 at _nlmsg_flush+0x37 #6 0xffffffff80dc4fdc at rtnl_lle_event+0x10c #7 0xffffffff80d15e32 at arp_mark_lle_reachable+0xd2 #8 0xffffffff80d15b43 at arp_check_update_lle+0x293 #9 0xffffffff80d151c5 at arpintr+0xa65 #10 0xffffffff80caaaed at netisr_dispatch_src+0xad #11 0xffffffff80c8d57a at ether_demux+0x0x17a #12 0xffffffff80c8ec53 at ether_nh_input+0x403 #13 0xffffffff80caaaed at netisr_dispatch_src+0xad #14 0xffffffff80c8d9c9 at ether_input+0xd9 #15 0xffffffff80ca66ac at iflib_rxeof+0xe4c #16 0xffffffff80ca0b5a at _task_fn_rx+0x7a #17 0xffffffff80ba0118 at gtaskqueue_run_locked+0xa8 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x30000 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80dc0a10 stack pointer = 0x28:0xfffffe006a3a8760 frame pointer = 0x28:0xfffffe006a3a8790 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1. def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_0) rdi: fffffe006a3a8850 rsi: fffffe006a3a86f0 rdx: fffffe006a3a87b0 rcx: fffff80001f88740 r8: ffffffff83210090 r9: 0000000000000000 rax: 0000000000000000 rbx: 0000000000030000 rbp: fffffe006a3a8790 r10: 0000000000000001 r11: 0000000000000000 r12: fffff8005fc8ca00 r13: fffff8005fc8ca20 r14: fffffe006a3a8850 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 0 time = 1704824328 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe006a3a8430 vpanic() at vpanic+0x131/frame 0xfffffe006a3a8560 panic() at panic+0x43/frame 0xfffffe006a3a85c0 trap_fatal() at trap_fatal+0x40f/frame 0xfffffe006a3a8620 trap_pfault() at trap_pfault+0xae/frame 0xfffffe006a3a8690 calltrap() at calltrap+0x8/frame 0xfffffe006a3a8690 --- trap 0xc, rip = 0xffffffff80dc0a10, rsp = 0xfffffe006a3a8760, rbp = 0xfffffe006a3a8790 --- nl_send_one() at nl_send_one+0x20/frame 0xfffffe006a3a8790 nl_send_group() at nl_send_group+0x1d2/frame 0xfffffe006a3a8820 _nlmsg-flush() at _nlmsg_flush+0x37/frame 0xfffffe006a3a8840 rtnl_lle_event() at rtnl_lle_event+0x10c/frame 0xfffffe006a3a88e0 arp_mark_lle_reachable() at arp_mark_lle_reachable+0xd2/frame 0xfffffe006a3a8930 arp_check_update_lle() at arp_check_update_lle+0x293/frame 0xfffffe006a3a8a00 arpintr() at arpintr+0xa65/frame 0xfffffe006a3a8b60 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe006a3a8bc8 ether_demux() at ether_demux+0x17a/frame 0xfffffe006a4a8bf0 ether_nh_input() at ether_nh_input+0x403/frame 0xfffffe006a3a8c40 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe006a3a8ca0 ether_input() at ehter_input+0xd9/frame 0xfffffe006a3a8d00 iflib_rxeof() at iflib_rxeof+0xe4c/frame 0xfffffe006a3a8e00 _task_fn_rx() at _task_fn_rx+0x7a/frame 0xfffffe006a3a8e40 gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa8/frame 0xfffffe006a3a8ec0 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd3/frame 0xfffffe006a3a8ef0 fork_exit() at fork_exit+0x82/frame 0xfffffe006a3a8f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe006a3a8f30 --- trap 0xf2b9f109, rip = 0x7afef8a176bef8a5, rsp = 0xddc963edd18963e9, rbp = 0x61f64fc36db64fc7 KDB: enter: panic [ thread pid 0 tid 100067 ] Stopped at kdb_enter+0x33: movq $0,0xe3a582(%rip) db> Since the current process 'if_io_tqg_0' and problems with netlink are mentioned, I searched in the area of my network connections. I discovered that this page fault only occurs when a connection is established with WireGuard (wg-quick up wg0). Without using WireGuard, this error does not occur. I was able to find out at which commit this behavior occurs with my box: - Up to commit main-n267347-660bd40a598a everything is fine. - The two following commits n267348-67d9023f07a4 and n267349-0ad011ececb9 do not build on my box (module/netlink broken ...). - From commit n267349-0ad011ececb9 (netlink) onwards this page fault occurs when WireGuard is started. Any help is greatly appreciated. CC'ed Gleb Smirnoff due to the affected commits. Regards, Rainer Hurling