From nobody Fri Nov 19 01:50:05 2021 X-Original-To: dev-ci@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1C1F51850D35 for ; Fri, 19 Nov 2021 01:50:11 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HwKNy745Tz4g9s; Fri, 19 Nov 2021 01:50:10 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [5.9.86.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.codepro.be", Issuer "R3" (verified OK)) (Authenticated sender: kp) by smtp.freebsd.org (Postfix) with ESMTPSA id A55332D73B; Fri, 19 Nov 2021 01:50:10 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: by venus.codepro.be (Postfix, authenticated sender kp) id D1B7D2AB64; Fri, 19 Nov 2021 02:50:07 +0100 (CET) From: "Kristof Provost" To: "John Baldwin" Cc: jenkins-admin@FreeBSD.org, dev-ci@FreeBSD.org Subject: Re: FreeBSD-main-amd64-test - Build #19839 - Failure Date: Thu, 18 Nov 2021 18:50:05 -0700 X-Mailer: MailMate (1.13.2r5673) Message-ID: In-Reply-To: References: <1727420938.1071.1637194678550@jenkins.ci.freebsd.org> <263150711.1077.1637200233260@jenkins.ci.freebsd.org> List-Id: Continuous Integration Build and Test Results List-Archive: https://lists.freebsd.org/archives/dev-ci List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-ci@freebsd.org X-BeenThere: dev-ci@freebsd.org MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=_MailMate_532E3070-790D-4348-839A-FFE9CD0B8790_=" Content-Transfer-Encoding: 8bit X-Spam: Yes X-ThisMailContainsUnwantedMimeParts: N --=_MailMate_532E3070-790D-4348-839A-FFE9CD0B8790_= Content-Type: text/plain; charset="UTF-8"; format=flowed; markup=markdown Content-Transfer-Encoding: 8bit On 18 Nov 2021, at 17:21, John Baldwin wrote: > On 11/17/21 5:50 PM, jenkins-admin@FreeBSD.org wrote: >> FreeBSD-main-amd64-test - Build #19839 >> (4082b189d2ce00674439226c9d5a8bdcafd23d01) - Failure >> >> Build information: >> https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/ >> Full change log: >> https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/changes >> Full build log: >> https://ci.FreeBSD.org/job/FreeBSD-main-amd64-test/19839/console >> >> Status explanation: >> "Failure" - the build is suspected being broken by the following >> changes >> "Still Failing" - the build has not been fixed by the following >> changes and >> this is a notification to note that these changes >> have >> not been fully tested by the CI system >> >> Change summaries: >> (Those commits are likely but not certainly responsible) >> >> b928e924f74b0b8f882a9b735611421a93113640 by jhb: >> rtld-elf: Use _get_tp in __tls_get_addr for aarch64 and riscv64. >> >> a8d885296a9dc517e731723081c83d97d2aa598f by jhb: >> linux_linkat: Don't invert AT_* flags. >> >> 8b2ce7a3bbd0a754d31ff3943d918b4c84c831a3 by jhb: >> linux_name_to_handle_at: Support AT_EMPTY_PATH. >> >> 1962164584a91078418afcd7c979afef13df8c4d by jhb: >> imgact_elf: Use bool instead of boolean_t. >> >> 4082b189d2ce00674439226c9d5a8bdcafd23d01 by jhb: >> elf*_brand_inuse: Change return type to bool. > > I don't the panic is related to these commits. > >> The end of the build log: >> >> [...truncated 4.27 MB...] >> --- trap 0xc, rip = 0x2100c7156d9a, rsp = 0x4423f48, rbp = 0x4423f60 >> --- > > The useful parts of the panic were earlier in the log: > > 01:50:13 sys/netpfil/common/dummynet:pf_queue_v6 -> epair5a: > Ethernet address: 02:b5:98:ea:1d:0a > 01:50:14 epair5b: Ethernet address: 02:b5:98:ea:1d:0b > 01:50:14 epair5a: link state changed to UP > 01:50:14 epair5b: link state changed to UP > 01:50:14 epair5a: link state changed to DOWN > 01:50:27 epair5b: link state changed to DOWN > 01:50:27 passed [13.831s] > 01:50:27 > 01:50:27 > 01:50:27 Fatal trap 12: page fault while in kernel mode > 01:50:27 cpuid = 0; apic id = 00 > 01:50:27 fault virtual address = 0x10 > 01:50:27 fault code = supervisor read data, page not present > 01:50:27 instruction pointer = 0x20:0xffffffff80e3c60f > 01:50:27 stack pointer = 0x28:0xfffffe00a5d9ec30 > 01:50:27 frame pointer = 0x28:0xfffffe00a5d9ed00 > 01:50:27 code segment = base 0x0, limit 0xfffff, type 0x1b > 01:50:27 = DPL 0, pres 1, long 1, def32 0, gran 1 > 01:50:27 processor eflags = interrupt enabled, resume, IOPL = 0 > 01:50:27 current process = 0 (dummynet) > 01:50:27 trap number = 12 > 01:50:27 panic: page fault > 01:50:27 cpuid = 0 > 01:50:27 time = 1637200227 > 01:50:27 KDB: stack backtrace: > 01:50:27 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe00a5d9e9f0 > 01:50:27 vpanic() at vpanic+0x17f/frame 0xfffffe00a5d9ea40 > 01:50:27 panic() at panic+0x43/frame 0xfffffe00a5d9eaa0 > 01:50:27 trap_fatal() at trap_fatal+0x385/frame 0xfffffe00a5d9eb00 > 01:50:27 trap_pfault() at trap_pfault+0xab/frame 0xfffffe00a5d9eb60 > 01:50:27 calltrap() at calltrap+0x8/frame 0xfffffe00a5d9eb60 > 01:50:27 --- trap 0xc, rip = 0xffffffff80e3c60f, rsp = > 0xfffffe00a5d9ec30, rbp = 0xfffffe00a5d9ed00 --- > 01:50:27 ip6_input() at ip6_input+0x4f/frame 0xfffffe00a5d9ed00 > 01:50:27 netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame > 0xfffffe00a5d9ed60 > 01:50:27 dummynet_send() at dummynet_send+0x1dd/frame > 0xfffffe00a5d9eda0 > 01:50:27 dummynet_task() at dummynet_task+0x493/frame > 0xfffffe00a5d9ee40 > 01:50:27 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame > 0xfffffe00a5d9eec0 > 01:50:27 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame > 0xfffffe00a5d9eef0 > 01:50:27 fork_exit() at fork_exit+0x80/frame 0xfffffe00a5d9ef30 > 01:50:27 fork_trampoline() at fork_trampoline+0xe/frame > 0xfffffe00a5d9ef30 > > So I suspect this is a race with teardown of the epair interfaces and > whatever > traffic the sys/netpfil/common/dummynet:pf_queue_v6 test was sending > when it > stopped. I've cc'd kp@ in case he has any ideas? > I think I’ve seen that before, while I was doing the pf/dummynet work. Dummynet will queue packets, delay them and eventually send them out. If between the enqueuing and dequeuing the packet the interface goes away we can end up with this panic. I have a patch to teach dummynet to walk its queues when an interface is removed and to drop any packets destined for that interface. I wound up not committing it because I was unable to reproduce the problem at the time. Can we reliably trigger this panic? I’d love to be able to add a test case for it, especially because the patch is non-trivial. Here’s that patch. Its commit message has exactly the same backtrace. Br, Kristof --=_MailMate_532E3070-790D-4348-839A-FFE9CD0B8790_= Content-Disposition: attachment; filename=0001-dummynet-purge-queued-packets-on-interface-destructi.patch Content-Type: text/plain Content-Transfer-Encoding: quoted-printable =46rom f9f51748a8b3f1eede241019c2fd62c7115174bb Mon Sep 17 00:00:00 2001 From: Kristof Provost Date: Wed, 7 Jul 2021 17:21:55 +0200 Subject: [PATCH] dummynet: purge queued packets on interface destruction When a struct ifnet goes away it's possible that dummynet will still have packets queued for it. We must purge those to ensure we do not attempt to send packets out a removed interface. This scenario is triggered by the dummynet:ipfw_queue_v6 test and looks like: Fatal trap 9: general protection fault while in kernel mode cpuid =3D 0; apic id =3D 00 instruction pointer =3D 0x20:0xffffffff80e33df6 stack pointer =3D 0x28:0xfffffe0148d1c170 frame pointer =3D 0x28:0xfffffe0148d1c240 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 0 (dummynet) trap number =3D 9 panic: general protection fault cpuid =3D 0 time =3D 1625664777 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0148= d1be70 vpanic() at vpanic+0x187/frame 0xfffffe0148d1bed0 panic() at panic+0x43/frame 0xfffffe0148d1bf30 trap_fatal() at trap_fatal+0x387/frame 0xfffffe0148d1bf90 trap() at trap+0xa4/frame 0xfffffe0148d1c0a0 calltrap() at calltrap+0x8/frame 0xfffffe0148d1c0a0 --- trap 0x9, rip =3D 0xffffffff80e33df6, rsp =3D 0xfffffe0148d1c170, rb= p =3D 0xfffffe0148d1c240 --- ip6_input() at ip6_input+0x46/frame 0xfffffe0148d1c240 netisr_dispatch_src() at netisr_dispatch_src+0xb1/frame 0xfffffe0148d1c2= a0 dummynet_send() at dummynet_send+0x1dd/frame 0xfffffe0148d1c2e0 dummynet_task() at dummynet_task+0x49c/frame 0xfffffe0148d1c380 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe0148d1= c400 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe0148= d1c430 fork_exit() at fork_exit+0x80/frame 0xfffffe0148d1c470 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0148d1c470 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- KDB: enter: panic On interface removal scan through the dummynet packet queues and remove any packet destined for the removed interface. MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") --- sys/netpfil/ipfw/dn_sched_qfq.c | 3 +- sys/netpfil/ipfw/dn_sched_wf2q.c | 7 +- sys/netpfil/ipfw/ip_dummynet.c | 120 +++++++++++++++++++++++++++++++ 3 files changed, 128 insertions(+), 2 deletions(-) diff --git a/sys/netpfil/ipfw/dn_sched_qfq.c b/sys/netpfil/ipfw/dn_sched_= qfq.c index e02f27dfd85e..3a297ac322ae 100644 --- a/sys/netpfil/ipfw/dn_sched_qfq.c +++ b/sys/netpfil/ipfw/dn_sched_qfq.c @@ -568,7 +568,8 @@ qfq_dequeue(struct dn_sch_inst *si) m =3D dn_dequeue(&cl->_q); = if (!m) { - D("BUG/* non-workconserving leaf */"); + /* This can happen if the packet has been purged, because it's + * destined for a removed interface. */ return NULL; } NO(q->queued--;) diff --git a/sys/netpfil/ipfw/dn_sched_wf2q.c b/sys/netpfil/ipfw/dn_sched= _wf2q.c index 7285df0d1d7d..766c6385fb32 100644 --- a/sys/netpfil/ipfw/dn_sched_wf2q.c +++ b/sys/netpfil/ipfw/dn_sched_wf2q.c @@ -242,8 +242,13 @@ wf2qp_dequeue(struct dn_sch_inst *_si) /* ok we have at least one eligible pkt */ q =3D HEAP_TOP(sch)->object; alg_fq =3D (struct wf2qp_queue *)q; - m =3D dn_dequeue(q); + m=3D dn_dequeue(q); heap_extract(sch, NULL); /* Remove queue from heap. */ + if (m =3D=3D NULL) { + /* Packet may have been purged. */ + heap_insert(&si->idle_heap, alg_fq->F, q); + return (NULL); + } si->V +=3D (uint64_t)(m->m_pkthdr.len) * si->inv_wsum; alg_fq->S =3D alg_fq->F; /* Update start time. */ if (q->mq.head =3D=3D 0) { /* not backlogged any more. */ diff --git a/sys/netpfil/ipfw/ip_dummynet.c b/sys/netpfil/ipfw/ip_dummyne= t.c index d3242fd85817..5d51af960393 100644 --- a/sys/netpfil/ipfw/ip_dummynet.c +++ b/sys/netpfil/ipfw/ip_dummynet.c @@ -45,6 +45,7 @@ __FBSDID("$FreeBSD$"); = #include #include +#include #include #include #include @@ -64,6 +65,8 @@ __FBSDID("$FreeBSD$"); #include /* ip_output(), IP_FORWARDING */ #include #include +#include +#include #include = #include @@ -101,6 +104,8 @@ CK_LIST_HEAD(, dn_alg) schedlist; CK_LIST_HEAD(, dn_aqm) aqmlist; /* list of AQMs */ #endif = +static eventhandler_tag ip_dn_detach_cookie; + static void dummynet(void *arg) { @@ -2593,6 +2598,8 @@ static void ip_dn_vnet_destroy(void) { DN_BH_WLOCK(); + V_dn_cfg.init_done =3D 0; + dummynet_flush(); DN_BH_WUNLOCK(); = @@ -2603,6 +2610,114 @@ ip_dn_vnet_destroy(void) DN_LOCK_DESTROY(); } = +static int +ip_dn_mq_flush_ifnet(struct mq *mq, struct ifnet *ifp) +{ + struct mbuf *m, *tmp; + struct mbuf *mprev; + int removed =3D 0; + + m =3D mq->head; + mprev =3D NULL; + + while (m !=3D NULL) { + struct dn_pkt_tag *p =3D dn_tag_get(m); + if (p->ifp !=3D ifp && m->m_pkthdr.rcvif !=3D ifp) { + mprev =3D m; + m =3D m->m_nextpkt; + continue; + } + + if (mprev =3D=3D NULL) { + /* We're removing the first element in the mq. */ + mq->head =3D m->m_nextpkt; + } else { + MPASS(mprev->m_nextpkt =3D=3D m); + mprev->m_nextpkt =3D m->m_nextpkt; + } + tmp =3D m; + m =3D m->m_nextpkt; + + tmp->m_nextpkt =3D NULL; + m_freem(tmp); + + removed++; + } + + mq->tail =3D mprev; + + return (removed); +} + +static int +ip_dn_q_flush_ifnet(void *_q, void *_arg) +{ + struct dn_queue *q =3D _q; + struct ifnet *ifp =3D _arg; + int removed; + + removed =3D ip_dn_mq_flush_ifnet(&q->mq, ifp); + q->count -=3D removed; + + return (0); +} + +static int +ip_dn_fs_flush_ifnet(void *_fs, void *_arg) +{ + struct dn_fsk *fs =3D _fs; + + if (!fs->qht) + return (0); + + if (fs->fs.flags & DN_QHT_HASH) + dn_ht_scan(fs->qht, ip_dn_q_flush_ifnet, _arg); + else + ip_dn_q_flush_ifnet(fs->qht, _arg); + + return (0); +} + +static int +ip_dn_schi_flush_ifnet(void *_si, void *_arg) +{ + struct dn_sch_inst *si =3D _si; + struct ifnet *ifp =3D _arg; + + ip_dn_mq_flush_ifnet(&si->dline.mq, ifp); + + return (0); +} + +static int +ip_dn_sched_flush_ifnet(void *_s, void *_arg) +{ + struct dn_schk *s =3D _s; + + if (s->siht =3D=3D NULL) + return (0); + + if (s->sch.flags & DN_HAVE_MASK) + dn_ht_scan(s->siht, ip_dn_schi_flush_ifnet, _arg); + else if (s->siht) + ip_dn_schi_flush_ifnet(s->siht, _arg); + + return (0); +} + +static void +ip_dn_detach_ifnet_event(void *arg __unused, struct ifnet *ifp) +{ + if (! V_dn_cfg.init_done) + return; + + DN_BH_WLOCK(); + /* Purge all queued packets for this interface. */ + dn_ht_scan(V_dn_cfg.fshash, ip_dn_fs_flush_ifnet, ifp); + dn_ht_scan(V_dn_cfg.schedhash, ip_dn_sched_flush_ifnet, ifp); + DN_BH_WUNLOCK(); +} + static void ip_dn_init(void) { @@ -2617,6 +2732,9 @@ ip_dn_init(void) taskqueue_thread_enqueue, &dn_tq); taskqueue_start_threads(&dn_tq, 1, PI_NET, "dummynet"); = + ip_dn_detach_cookie =3D EVENTHANDLER_REGISTER(ifnet_departure_event, + ip_dn_detach_ifnet_event, NULL, EVENTHANDLER_PRI_ANY); + CK_LIST_INIT(&schedlist); callout_init(&dn_timeout, 1); dn_reschedule(); @@ -2635,6 +2753,8 @@ ip_dn_destroy(int last) ip_dn_io_ptr =3D NULL; } = + EVENTHANDLER_DEREGISTER(ifnet_departure_event, ip_dn_detach_cookie); + callout_drain(&dn_timeout); taskqueue_drain(dn_tq, &dn_task); taskqueue_free(dn_tq); -- = 2.33.1 --=_MailMate_532E3070-790D-4348-839A-FFE9CD0B8790_=--