From nobody Tue Jan 09 20:23:54 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4T8j7f5Pw5z56VVH for ; Tue, 9 Jan 2024 20:23:58 +0000 (UTC) (envelope-from rhurlin@gwdg.de) Received: from mailer.gwdg.de (mailer.gwdg.de [134.76.10.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4T8j7d5Bsrz4pw6; Tue, 9 Jan 2024 20:23:57 +0000 (UTC) (envelope-from rhurlin@gwdg.de) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of rhurlin@gwdg.de designates 134.76.10.26 as permitted sender) smtp.mailfrom=rhurlin@gwdg.de Received: from mbx19-gwd-03.um.gwdg.de ([10.108.142.56] helo=email.gwdg.de) by mailer.gwdg.de with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (GWDG Mailer) (envelope-from ) id 1rNIde-0001cj-35; Tue, 09 Jan 2024 21:23:54 +0100 Received: from [192.168.178.23] (10.250.9.199) by MBX19-GWD-03.um.gwdg.de (10.108.142.56) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.2.1258.28; Tue, 9 Jan 2024 21:23:54 +0100 Message-ID: <423b62fc-6687-4e56-b8e7-ecaebcadfd7f@gwdg.de> Date: Tue, 9 Jan 2024 21:23:54 +0100 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Rainer Hurling Subject: kernel: fatal trap 12 on CURRENT, when using WireGuard Reply-To: Rainer Hurling To: Content-Language: en-US CC: Gleb Smirnoff Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.250.9.199] X-ClientProxiedBy: EXCMBX-13.um.gwdg.de (134.76.9.222) To MBX19-GWD-03.um.gwdg.de (10.108.142.56) X-Spam-Level: - X-Virus-Scanned: (clean) by clamav X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.88 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; NEURAL_HAM_SHORT(-0.99)[-0.990]; RWL_MAILSPIKE_EXCELLENT(-0.40)[134.76.10.26:from]; RCVD_IN_DNSWL_MED(-0.20)[134.76.10.26:from]; R_SPF_ALLOW(-0.20)[+ip4:134.76.10.0/23]; MIME_GOOD(-0.10)[text/plain]; XM_UA_NO_VERSION(0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; REPLYTO_DN_EQ_FROM_DN(0.00)[]; DMARC_NA(0.00)[gwdg.de]; RCPT_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; REPLYTO_DOM_NEQ_FROM_DOM(0.00)[]; FREEFALL_USER(0.00)[rhurlin]; ASN(0.00)[asn:207592, ipnet:134.76.0.0/16, country:DE]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; HAS_XOIP(0.00)[]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; MID_RHS_MATCH_FROM(0.00)[]; HAS_REPLYTO(0.00)[rhurlin@FreeBSD.org] X-Rspamd-Queue-Id: 4T8j7d5Bsrz4pw6 I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a very recent commit. The build and install went fine. After booting with new base, I got a page fault with the following error: Kernel page fault with the following non-sleepable locks held: shared rm netlink lock (netlink lock) r = 0 (0xfffff8005fc8ca20) locked @ /usr/src/sys/netlink/netlink_domain.c:241 exclusive rw lle (lle) r = 0 (0xfffff801951dce90) locked @ /usr/src/sys/netinet/in.c:1716 stack backtrace: #0 0xffffffff80bc6c45 at witness_debugger+0x65 #1 0xffffffff80bc7d89 at witness_warn+0x3e9 #2 0xffffffff81056b18 at trap_pfault+0x88 #3 0xffffffff81028708 at calltrap+0x8 #4 0xffffffff80dbd6a2 at nl_send_group+0x1d2 #5 0xffffffff80dc0e27 at _nlmsg_flush+0x37 #6 0xffffffff80dc4fdc at rtnl_lle_event+0x10c #7 0xffffffff80d15e32 at arp_mark_lle_reachable+0xd2 #8 0xffffffff80d15b43 at arp_check_update_lle+0x293 #9 0xffffffff80d151c5 at arpintr+0xa65 #10 0xffffffff80caaaed at netisr_dispatch_src+0xad #11 0xffffffff80c8d57a at ether_demux+0x0x17a #12 0xffffffff80c8ec53 at ether_nh_input+0x403 #13 0xffffffff80caaaed at netisr_dispatch_src+0xad #14 0xffffffff80c8d9c9 at ether_input+0xd9 #15 0xffffffff80ca66ac at iflib_rxeof+0xe4c #16 0xffffffff80ca0b5a at _task_fn_rx+0x7a #17 0xffffffff80ba0118 at gtaskqueue_run_locked+0xa8 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x30000 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80dc0a10 stack pointer = 0x28:0xfffffe006a3a8760 frame pointer = 0x28:0xfffffe006a3a8790 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1. def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_0) rdi: fffffe006a3a8850 rsi: fffffe006a3a86f0 rdx: fffffe006a3a87b0 rcx: fffff80001f88740 r8: ffffffff83210090 r9: 0000000000000000 rax: 0000000000000000 rbx: 0000000000030000 rbp: fffffe006a3a8790 r10: 0000000000000001 r11: 0000000000000000 r12: fffff8005fc8ca00 r13: fffff8005fc8ca20 r14: fffffe006a3a8850 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 0 time = 1704824328 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe006a3a8430 vpanic() at vpanic+0x131/frame 0xfffffe006a3a8560 panic() at panic+0x43/frame 0xfffffe006a3a85c0 trap_fatal() at trap_fatal+0x40f/frame 0xfffffe006a3a8620 trap_pfault() at trap_pfault+0xae/frame 0xfffffe006a3a8690 calltrap() at calltrap+0x8/frame 0xfffffe006a3a8690 --- trap 0xc, rip = 0xffffffff80dc0a10, rsp = 0xfffffe006a3a8760, rbp = 0xfffffe006a3a8790 --- nl_send_one() at nl_send_one+0x20/frame 0xfffffe006a3a8790 nl_send_group() at nl_send_group+0x1d2/frame 0xfffffe006a3a8820 _nlmsg-flush() at _nlmsg_flush+0x37/frame 0xfffffe006a3a8840 rtnl_lle_event() at rtnl_lle_event+0x10c/frame 0xfffffe006a3a88e0 arp_mark_lle_reachable() at arp_mark_lle_reachable+0xd2/frame 0xfffffe006a3a8930 arp_check_update_lle() at arp_check_update_lle+0x293/frame 0xfffffe006a3a8a00 arpintr() at arpintr+0xa65/frame 0xfffffe006a3a8b60 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe006a3a8bc8 ether_demux() at ether_demux+0x17a/frame 0xfffffe006a4a8bf0 ether_nh_input() at ether_nh_input+0x403/frame 0xfffffe006a3a8c40 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe006a3a8ca0 ether_input() at ehter_input+0xd9/frame 0xfffffe006a3a8d00 iflib_rxeof() at iflib_rxeof+0xe4c/frame 0xfffffe006a3a8e00 _task_fn_rx() at _task_fn_rx+0x7a/frame 0xfffffe006a3a8e40 gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa8/frame 0xfffffe006a3a8ec0 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd3/frame 0xfffffe006a3a8ef0 fork_exit() at fork_exit+0x82/frame 0xfffffe006a3a8f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe006a3a8f30 --- trap 0xf2b9f109, rip = 0x7afef8a176bef8a5, rsp = 0xddc963edd18963e9, rbp = 0x61f64fc36db64fc7 KDB: enter: panic [ thread pid 0 tid 100067 ] Stopped at kdb_enter+0x33: movq $0,0xe3a582(%rip) db> Since the current process 'if_io_tqg_0' and problems with netlink are mentioned, I searched in the area of my network connections. I discovered that this page fault only occurs when a connection is established with WireGuard (wg-quick up wg0). Without using WireGuard, this error does not occur. I was able to find out at which commit this behavior occurs with my box: - Up to commit main-n267347-660bd40a598a everything is fine. - The two following commits n267348-67d9023f07a4 and n267349-0ad011ececb9 do not build on my box (module/netlink broken ...). - From commit n267349-0ad011ececb9 (netlink) onwards this page fault occurs when WireGuard is started. Any help is greatly appreciated. CC'ed Gleb Smirnoff due to the affected commits. Regards, Rainer Hurling