From nobody Thu Apr 13 06:43:03 2023 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PxqlY1tTwz45Zp3 for ; Thu, 13 Apr 2023 06:43:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PxqlW6j4Nz3lTG for ; Thu, 13 Apr 2023 06:43:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1681368184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=wch6/yS+xQIDm2JdmO38X9DVmV90wuTvrz1Y5GiBOXY=; b=Wj2CSYbE6txLy+Hb+eBK/udJnAHun4WANnVWsMtLW00YTRoH3mZH1S8suGwP/mYDed9Rd2 6hToCSbfi1/Dti+qIRmA28xmgEY/olC+CqkzgSxBleX6qkHImjUJktEqO21rVDyrZdWFqm w2tEBSXY3v8ZNpyidQLG1YaPWBXZbZiSqwLkqTVi+aDp/RN8ppjhuFz1Sx09MTSqhlzX4t 70eV8YyWoP21Ls0wc0JRHQje1lk14hUpbY5M3liQFjHsSb3XmvDp746pRfGByYB8ja1Isz oHgHfqP16bcXDBM5VmNW65cD4LPmnXRwHEk+vVTm4Sop7gLhCn87XhItrgtsfQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1681368184; a=rsa-sha256; cv=none; b=ktWDl9tYqBR0uLFpWRakGJDaQi9VqkzSIdnetV9m9HoJV7fkDlhyU9ljkK5KSj4vp9wIAx eF4KcP1/DHTeVgzCwb2Xhv4UvzOBgUxfcbtIeaT9NujbXzvuRXbyVVovrYGawJI82fL6BL 5C2sDjlMQ/6165dSNWRn0+eBJCeLl1PSRRD95wXeviJLmWvOCMgIha2jbbASPo3HtJIOPg 9OfyHDY41/tr/BwUy+AlDreUkwX1dAJB2+yJQuW3efprC1uJRpJ1mr/3CxEfBCelqPoq+Y ja/L44fsOcUmbq9eG7XYn75StCFt2D9VQJ/gxurGS70BFymQ+NpSA/KdkVI68w== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4PxqlW5CcCzX6g for ; Thu, 13 Apr 2023 06:43:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 33D6h37c065388 for ; Thu, 13 Apr 2023 06:43:03 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 33D6h3Ls065387 for bugs@FreeBSD.org; Thu, 13 Apr 2023 06:43:03 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 270813] kernel crashes when ena driver is unloaded Date: Thu, 13 Apr 2023 06:43:03 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: akiyano@amazon.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D270813 Bug ID: 270813 Summary: kernel crashes when ena driver is unloaded Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: akiyano@amazon.com Reproduction steps: ------------------- 1. Create an AWS EC2 instance from FreeBSD 14.0-CURRENT-amd64-20230323 UEFI= , ami-02dbe14b26d93d722 in us-east-1 (or any newer ami that starts with "Free= BSD 14.0-CURRENT-amd64-") 2. run kldunload if_ena.ko Result: ------- Crashes every time. 100% reproducible. Core dump stack: __curthread () at /root/freebsd-src/sys/amd64/include/pcpu_aux.h:59 59 __asm("movq %%gs:%P1,%0" : "=3Dr" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /root/freebsd-src/sys/amd64/include/pcpu_aux.h:= 59 #1 doadump (textdump=3Dtextdump@entry=3D1) at /root/freebsd-src/sys/kern/kern_shutdown.c:407 #2 0xffffffff80bedc6c in kern_reboot (howto=3D260) at /root/freebsd-src/sys/kern/kern_shutdown.c:528 #3 0xffffffff80bee18f in vpanic (fmt=3D, ap=3Dap@entry=3D0xfffffe01fef62ae0) at /root/freebsd-src/sys/kern/kern_shutdown.c:972 #4 0xffffffff80bedf13 in panic (fmt=3D) at /root/freebsd-src/sys/kern/kern_shutdown.c:896 #5 0xffffffff810e2b39 in trap_fatal (frame=3D0xfffffe01fef62b70, eva=3D0) at /root/freebsd-src/sys/amd64/amd64/trap.c:954 #6 #7 dump_sa (nw=3Dnw@entry=3D0xfffffe01fef62d08, attr=3Dattr@entry=3D1, sa=3D0xdeadc0dedeadc0de) at /root/freebsd-src/sys/netlink/route/iface.c:210 #8 0xffffffff80e5659a in dump_iface (nw=3Dnw@entry=3D0xfffffe01fef62d08, ifp=3Difp@entry=3D0xfffff80109bbe800, hdr=3Dhdr@entry=3D0xfffffe01fef62d48, if_flags_mask=3Dif_flags_mask@entry=3D0) at /root/freebsd-src/sys/netlink/route/iface.c:279 #9 0xffffffff80e55e7b in rtnl_handle_ifevent (ifp=3D0xfffff80109bbe800, nlmsg_type=3D, if_flags_mask=3D0) at /root/freebsd-src/sys/netlink/route/iface.c:943 #10 0xffffffff80d1fc1d in do_link_state_change (arg=3D0xfffff80109bbe800, pending=3D1) at /root/freebsd-src/sys/net/if.c:2205 #11 0xffffffff80c5233a in taskqueue_run_locked ( queue=3Dqueue@entry=3D0xfffff80106ce7100) at /root/freebsd-src/sys/kern/subr_taskqueue.c:514 #12 0xffffffff80c5224d in taskqueue_run (queue=3D0xfffff80106ce7100) at /root/freebsd-src/sys/kern/subr_taskqueue.c:529 #13 0xffffffff80ba8126 in intr_event_execute_handlers (ie=3D0xfffff80106a9d= 300, p=3D) at /root/freebsd-src/sys/kern/kern_intr.c:1207 #14 ithread_execute_handlers (ie=3D0xfffff80106a9d300, p=3D) at /root/freebsd-src/sys/kern/kern_intr.c:1220 #15 ithread_loop (arg=3Darg@entry=3D0xfffff80106c951c0) at /root/freebsd-src/sys/kern/kern_intr.c:1308 #16 0xffffffff80ba45c0 in fork_exit ( callout=3D0xffffffff80ba7eb0 , arg=3D0xfffff80106c951c0, frame=3D0xfffffe01fef62f40) at /root/freebsd-src/sys/kern/kern_fork.c:1102 #17 (kgdb) Initial investigation results: ------------------------------ 1. printed ifp->if_addr->ifa_addr inside do_link_state_change and it is 0xdeadc0dedeadc0de. 2. Initially I suspected that it is some kernel issue. I therefore tried to find a kernel commit that caused this: The last non crashing instance is with ami (us-east-1): FreeBSD 14.0-CURRENT-amd64-20230316 UEFI , ami-0d80d8baae9fea731 uname -a shows kernel commit hash cee09bda03c8 The first crashing instance is with ami (us-east-1: FreeBSD 14.0-CURRENT-amd64-20230323 UEFI , ami-02dbe14b26d93d722 uname -a shows kernel commit hash b5d43972e394 However I saw that if the ami was a crashing ami - then no matter which ker= nel I built and installed from sources, the issue reproduced. And the other way= , if I used a non crashing ami, no matter which kernel I build and installed form sources, the issue didn't reproduce. So I figured it is a Userland issue. So I went on to build and install User= land without kernel until I found the commit that caused the issue. (command used make buildworld -j`sysctl -n hw.ncpu` && make installworld -j`sysctl -n hw.ncpu` && reboot) This commit proved to be: https://reviews.freebsd.org/D39048 (commit before doesnt crash, commits >=3D crash). Relevant discussions: --------------------- Initially I commented in https://reviews.freebsd.org/D39048, which created = an email thread where the following was written: Zhenlei Huang : =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D iface.c:210 That might be line 214. Also be aware that `sa =3D=3D 0xdeadc0dedeadc0de`.=20 ``` static bool dump_iface(struct nl_writer *nw, struct ifnet *ifp, const struct nlmsghdr *= hdr, int if_flags_mask) { ... if ((ifp->if_addr !=3D NULL)) { dump_sa(nw, IFLA_ADDRESS, ifp->if_addr->ifa_addr); } ... } ``` There probably have concurrency between ifp destroying and interface status event handling. `ifp` might be freed before this event handler rtnl_handle_ifevent() . So only checking `ifp->if_addr !=3D NULL` is not enough. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D Fix thoughts: ------------- My first thought was to alter the dump_iface code that @zlei pointed out an= d to check if "if->addr !=3D0xdeadc0dedeadc0de" But I didn't find any code that does that or a #define for 0xdeadc0dedeadc0= de that I could use. So I guess this is not the right way to do this. Would appreciate any suggestions you may have on how to tackle this. --=20 You are receiving this mail because: You are the assignee for the bug.=