From nobody Mon Jul 15 18:40:05 2024 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WN9xM0BB9z5RRkh; Mon, 15 Jul 2024 18:40:23 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WN9xL1fjyz4PCk; Mon, 15 Jul 2024 18:40:22 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=QBcJW8t9; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of mjguzik@gmail.com designates 2a00:1450:4864:20::52e as permitted sender) smtp.mailfrom=mjguzik@gmail.com Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-58ba3e38028so6093881a12.0; Mon, 15 Jul 2024 11:40:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721068819; x=1721673619; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=G+6QyDVCDF7/HAcqlVRTaApi0lrEQ0HqhF3gG5u5Ez8=; b=QBcJW8t9tcLKrq4MCQUPdgVQ2lgVV5Hb+iwJ5B0gaWdM+oOndglsiZgnc8FOwGdlHn zKNsF3Fq3ukWs84QudyQ/KXuII03Cirk4M5GzmdWCM4PyRL6i+3BVTAC2bDfIz0CjDVV xRX8GRWsOv4XlVa4zhgnrpSVIR/AntcYqF6apb2S0bVxNugnZ3nKTOBr/vstLoXhwUGU vrBDm0MGCEKccE/ZKGRCBYiUvnob3/amTWBGdQ6chaNx7Q7mheFYTcZd4U6xiHV+gyJ9 2FewMQZi+gMyQfQRGYcLo/2ZYReSGzElOv1wi+zFQJuHV1FaGWW79gRwCuZ/lU9m83gz /ctA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721068819; x=1721673619; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G+6QyDVCDF7/HAcqlVRTaApi0lrEQ0HqhF3gG5u5Ez8=; b=kbn1AHCOcVDLk25HqUF/wDGg5ZmI0iHzYxO8R3m1fboPqqfjhOjOUyHOKrvwTCnBFE Gv0wB7ax99zUerXYnv0Ei8EFIGsWsR++qM7muEQoa4Veufz8fetBGP4FmqtzuShEPOJC sCjl1bbH1SjbcncISWDvIp99AeDxd0j8U+SLjQg1NcoIwR+sBV0NnWffk1P5UKPvJ4hH wiOvyaJ8Aljt21KVyoJjgdTJTdOrP2jqQvVxHhxtuwyA3GdfCaLEmAB1cUql+NW8kOHV jVDlThBf35f/2r0GicKpRvPp7mGQ80LnKi4rTfAoI46SfYq+GIS5UHNSkG4+uJDfYA9Q yaxQ== X-Forwarded-Encrypted: i=1; AJvYcCV6JM8ejYqYgMBsPIBGx2KtIYEgXVp/h2gNlgJk/10Z+gK5/rt99YnFRN16nzhy2+J2fuEJpD11nd+b8h1JMlr1mbMctEoYta3ndL8nNwv4WYYCGhKdGXduEHuVnQn1IVp3iY9Qp58LurlnNfxfnEg2+k0QABKPMdlYfac5LE5HmknA7V107YH7o9KUPQY= X-Gm-Message-State: AOJu0YyQ3fXomxwhv0JbHXjuEgh85YsPNYWsFz6zMNfRzHROqNc+Sq10 iEVshNKSHCG1aQVKG8+Z+EXUqpNmTpf6mpZVJkFU5HYDgnOIWI9E7WCzc6txE6mfiYTvn8Se+sT gCKxaoQa1qZ79WsOI6SVpg16RmP6prLoQ X-Google-Smtp-Source: AGHT+IESwKX3NnCWob+lxqX043NKTYDAH5Xpct/KYn5G84AcgL+z7ES9xeGIi3tDMRM/SWomNUTY4QbeeehrRtMhTpI= X-Received: by 2002:a17:906:1545:b0:a77:dd1c:6273 with SMTP id a640c23a62f3a-a79e69389a6mr31211866b.12.1721068818592; Mon, 15 Jul 2024 11:40:18 -0700 (PDT) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-all@freebsd.org Sender: owner-dev-commits-src-all@FreeBSD.org MIME-Version: 1.0 References: <202407111107.46BB7uSb007077@gitrepo.freebsd.org> <81cfe7ad-cbee-4122-abef-e47ce2b34f05@FreeBSD.org> In-Reply-To: From: Mateusz Guzik Date: Mon, 15 Jul 2024 20:40:05 +0200 Message-ID: Subject: Re: git: 87ee63bac69d - main - locks: add a runtime check for missing turnstile To: John Baldwin Cc: Mateusz Guzik , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.98)[-0.983]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[text/plain]; MISSING_XM_UA(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::52e:from]; RCPT_COUNT_FIVE(0.00)[5]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[dev-commits-src-all@freebsd.org,dev-commits-src-main@freebsd.org]; DKIM_TRACE(0.00)[gmail.com:+] X-Rspamd-Queue-Id: 4WN9xL1fjyz4PCk On Mon, Jul 15, 2024 at 8:33=E2=80=AFPM Mateusz Guzik w= rote: > > On Mon, Jul 15, 2024 at 8:21=E2=80=AFPM John Baldwin wr= ote: > > > > On 7/15/24 13:59, Mateusz Guzik wrote: > > > On Mon, Jul 15, 2024 at 6:22=E2=80=AFPM John Baldwin wrote: > > >> > > >> On 7/11/24 07:07, Mateusz Guzik wrote: > > >>> The branch main has been updated by mjg: > > >>> > > >>> URL: https://cgit.FreeBSD.org/src/commit/?id=3D87ee63bac69dc49291f5= 5590b8baa57cad6c7d85 > > >>> > > >>> commit 87ee63bac69dc49291f55590b8baa57cad6c7d85 > > >>> Author: Mateusz Guzik > > >>> AuthorDate: 2024-07-11 00:17:27 +0000 > > >>> Commit: Mateusz Guzik > > >>> CommitDate: 2024-07-11 11:06:52 +0000 > > >>> > > >>> locks: add a runtime check for missing turnstile > > >>> > > >>> There are sometimes bugs which result in the unlock fast path= failing, > > >>> which in turns causes a not-helpful crash report when derefer= encing a > > >>> NULL turnstile. Help debugging such cases by pointing out wha= t happened > > >>> along with some debug. > > >>> > > >>> Sponsored by: Rubicon Communications, LLC ("Netgate") > > >>> --- > > >>> sys/kern/kern_mutex.c | 4 +++- > > >>> sys/kern/kern_rwlock.c | 16 ++++++++++++---- > > >>> 2 files changed, 15 insertions(+), 5 deletions(-) > > >>> > > >>> diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c > > >>> index 90361b23c09a..0fa624cc4bb1 100644 > > >>> --- a/sys/kern/kern_mutex.c > > >>> +++ b/sys/kern/kern_mutex.c > > >>> @@ -1053,7 +1053,9 @@ __mtx_unlock_sleep(volatile uintptr_t *c, uin= tptr_t v) > > >>> turnstile_chain_lock(&m->lock_object); > > >>> _mtx_release_lock_quick(m); > > >>> ts =3D turnstile_lookup(&m->lock_object); > > >>> - MPASS(ts !=3D NULL); > > >>> + if (__predict_false(ts =3D=3D NULL)) { > > >>> + panic("got NULL turnstile on mutex %p v %zx", m, v); > > >>> + } > > >> > > >> Hmm, this is just an expanded KASSERT() but always on rather than co= nditional on INVARIANTS? > > >> > > >> Do you have examples of the type of bugs that cause this? (Is it un= locking a freed mutex > > >> or the like?) We generally hide all these types of checks under INV= ARIANTS rather than > > >> shipping them in release kernels. > > >> > > > > > > Use-after-free, overflow, underflow, bitflip or what have you all can > > > fail the fast path. > > > > > > Once that happens and the kernel crashes with a null pointer deref, > > > here is a crash at netgate which prodded this: > > > calltrap() at calltrap+0x8/frame 0xfffffe0106720920 > > > --- trap 0xc, rip =3D 0xffffffff80d5ab70, rsp =3D 0xfffffe01067209f0,= rbp > > > =3D 0xfffffe0106720a00 --- > > > turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe01067= 20a00 > > > __rw_wunlock_hard() at __rw_wunlock_hard+0x9e/frame 0xfffffe0106720a3= 0 > > > nd6_resolve_slow() at nd6_resolve_slow+0x2d7/frame 0xfffffe0106720aa0 > > > nd6_resolve() at nd6_resolve+0x125/frame 0xfffffe0106720b10 > > > ether_output() at ether_output+0x4e7/frame 0xfffffe0106720ba0 > > > ip_output_send() at ip_output_send+0xdc/frame 0xfffffe0106720be0 > > > ip_output() at ip_output+0x1295/frame 0xfffffe0106720ce0 > > > ip_forward() at ip_forward+0x3c2/frame 0xfffffe0106720d90 > > > ip_input() at ip_input+0x705/frame 0xfffffe0106720df0 > > > swi_net() at swi_net+0x138/frame 0xfffffe0106720e60 > > > ithread_loop() at ithread_loop+0x257/frame 0xfffffe0106720ef0 > > > fork_exit() at fork_exit+0x7f/frame 0xfffffe0106720f30 > > > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0106720f30 > > > --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- > > > > > > Neither the register dump nor anything in the backtrace indicate what= happened. > > > > > > Since the kernel is going down anyway, one may as well get some debug= from it. > > > > If you don't mind the extra branches for sanity checks, why not just ru= n with > > INVARIANTS? That is, what makes these particular assertions different = from > > other assertions such that they should be on unconditionally? The last= line below > > applies to pretty much every other assertion in the tree. > > > > This adds a branch in the slowpath, a spot which should relatively > rarely execute compared to the fast path. On top of that the branch at > hand does not do any extra memory accesses or complex arithmetic. > > So no, I don't think I may as well run with INVARIANTS. How about this: if you strongly about this branch, feel free to revert the commit, I'm just going to keep the change at Netgate. But should you do it, make sure to not add avoidable branches in your stuff= . Cheers. --=20 Mateusz Guzik