From nobody Fri Sep 01 03:44:11 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RcP674980z4s0XZ; Fri, 1 Sep 2023 03:44:15 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta001.cacentral1.a.cloudfilter.net (omta001.cacentral1.a.cloudfilter.net [3.97.99.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RcP665z8Xz4Q4p; Fri, 1 Sep 2023 03:44:14 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Authentication-Results: mx1.freebsd.org; dkim=none; spf=none (mx1.freebsd.org: domain of cy.schubert@cschubert.com has no SPF policy when checking 3.97.99.32) smtp.mailfrom=cy.schubert@cschubert.com; dmarc=none Received: from shw-obgw-4004a.ext.cloudfilter.net ([10.228.9.227]) by cmsmtp with ESMTP id bp28qLkS0LAoIbv4wqMbUp; Fri, 01 Sep 2023 03:44:14 +0000 Received: from spqr.komquats.com ([70.66.152.170]) by cmsmtp with ESMTPA id bv4uqgUfH3fOSbv4vqXPLO; Fri, 01 Sep 2023 03:44:14 +0000 X-Authority-Analysis: v=2.4 cv=J8G5USrS c=1 sm=1 tr=0 ts=64f15e0e a=y8EK/9tc/U6QY+pUhnbtgQ==:117 a=y8EK/9tc/U6QY+pUhnbtgQ==:17 a=xqWC_Br6kY4A:10 a=IkcTkHD0fZMA:10 a=UttIx32zK-AA:10 a=YxBL1-UpAAAA:8 a=6I5d2MoRAAAA:8 a=EkcXrb_YAAAA:8 a=VtD-V8Bl0kRGtFH_ge0A:9 a=QEXdDO2ut3YA:10 a=Ia-lj3WSrqcvXOmTRaiG:22 a=IjZwj45LgO3ly-622nXo:22 a=LK5xJRSDVpKd5WXXoEvA:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id 3ECB15EA; Thu, 31 Aug 2023 20:44:12 -0700 (PDT) Received: from slippy (localhost [IPv6:::1]) by slippy.cwsent.com (Postfix) with ESMTP id ED07FD6; Thu, 31 Aug 2023 20:44:11 -0700 (PDT) Date: Thu, 31 Aug 2023 20:44:11 -0700 From: Cy Schubert To: Kyle Evans Cc: Alexander Motin , Gleb Smirnoff , Martin Matuska , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad Message-ID: <20230831204411.02780107@slippy> In-Reply-To: <20230831203801.24fade6d@slippy> References: <202308270509.37R596B5048298@gitrepo.freebsd.org> <20230831203801.24fade6d@slippy> Organization: KOMQUATS X-Mailer: Claws Mail 3.19.0 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfPGihm+RcxbXTOjkGgi+4u0e7sqhs62sSEcOLZgIi2wcklCXGJ/HYk4AA9UVAlIoHeyrYIb8+DgMYKsed4fySeGoGA4Ip5g/12HEbgGLj9H5Y9otT67K bZJNjtHXq1/CVb48Si6och2ahTvin40Nh60UhCqj/x/hed+a5gRKBab/l16KFx8Q1ZyXVfc3RjYCTtaOkO0C/9fSfJzZ2JQ68RJRywhTF27MFE/Pi1x6npWd ZHr+d5Gc/qqIx7V3JRc3OvATgvjjCr4rc1Ye0Vl5uPfrDFtLvVn5j7WmW5H3jp4D0p0ApTL8UjikhAbbUdposEPkj147mUzrP/4QfBISRISsc3+Oaq7Aj3uZ Ze08zY82eEjORzckEM5WpgPfBqtkJLoGd7BaIEixp/rTdjJOhJQ1BmPVFukbvCiGOFRPRDK5 X-Spamd-Bar: - X-Spamd-Result: default: False [-1.70 / 15.00]; AUTH_NA(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; MID_RHS_NOT_FQDN(0.50)[]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[3.97.99.32:from]; FROM_EQ_ENVFROM(0.00)[]; MLMMJ_DEST(0.00)[dev-commits-src-main@freebsd.org,dev-commits-src-all@freebsd.org]; RCPT_COUNT_SEVEN(0.00)[7]; ASN(0.00)[asn:16509, ipnet:3.96.0.0/15, country:US]; MIME_TRACE(0.00)[0:+]; R_SPF_NA(0.00)[no SPF record]; R_DKIM_NA(0.00)[]; HAS_ORG_HEADER(0.00)[]; RCVD_TLS_LAST(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; DMARC_NA(0.00)[cschubert.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[] X-Rspamd-Queue-Id: 4RcP665z8Xz4Q4p On Thu, 31 Aug 2023 20:38:01 -0700 Cy Schubert wrote: > On Thu, 31 Aug 2023 21:18:30 -0500 > Kyle Evans wrote: >=20 > > On 8/30/23 20:01, Alexander Motin wrote: =20 > > > On 30.08.2023 20:09, Gleb Smirnoff wrote: =20 > > >> =C2=A0=C2=A0 Hi Martin, > > >> > > >> we hit this panic with this update: > > >> > > >> panic: VERIFY(avl_is_empty(&lwb->lwb_vdev_tree)) failed > > >> > > >> cpuid =3D 8 > > >> time =3D 1693432004 > > >> KDB: stack backtrace: > > >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame=20 > > >> 0xfffffe09519aa800 > > >> vpanic() at vpanic+0x132/frame 0xfffffe09519aa930 > > >> spl_panic() at spl_panic+0x3a/frame 0xfffffe09519aa990 > > >> zil_free_lwb() at zil_free_lwb+0x1de/frame 0xfffffe09519aa9c0 > > >> zil_sync() at zil_sync+0x3cb/frame 0xfffffe09519aaaa0 > > >> [Wed Aug 30 21:46:44 2023]dmu_objset_sync() at=20 > > >> dmu_objset_sync+0x51b/frame 0xfffffe09519aab70 > > >> dsl_pool_sync() at dsl_pool_sync+0x11d/frame 0xfffffe09519aabf0 > > >> spa_sync() at spa_sync+0xc68/frame 0xfffffe09519aae20 > > >> txg_sync_thread() at txg_sync_thread+0x2eb/frame 0xfffffe09519aaef0 > > >> fork_exit() at fork_exit+0x82/frame 0xfffffe09519aaf30 > > >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe09519aaf30 > > >> --- trap 0x26d90a2f, rip =3D 0xbba54c81b25fd8d4, rsp =3D=20 > > >> 0xff60c2f6f9874bcd, rbp =3D 0x37940e3e9793bd40 -- > > >> > > >> I have two cores dumped for this panic. > > >> > > >> Our previous version is based on=20 > > >> 9c0a2d522f3a18fc795a7dd6266d2be2aff4b084, which > > >> I guess has the most recent ZFS merge openzfs/zfs@009d3288d. > > >> =20 > > > Hi Gleb, > > > > > > It is the first time I see a panic like this. I'll think about it > > > tomorrow. But I'd appreciate any information on what is your worklo= ad > > > and what are you doing related to ZIL (O_SYNC, fsync(), sync=3Dalway= s, > > > etc) to trigger it? What is your pool configuration? > > > =20 > >=20 > > Hello! > >=20 > > Probably not related, but I seem to have hit a use-after-free in ZIL=20 > > bits; is this something you're interested in? Backtrace is: > >=20 > > #17 zio_nowait (zio=3D0xdeadc0dedeadc0de) at=20 > > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zio.c:2367 > > #18 0xffff00000160720c in zil_lwb_write_issue=20 > > (zilog=3Dzilog@entry=3D0xffffa00021631800, lwb=3D0xffffa00933d34c40) > > at /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zil.c:1935 > > --Type for more, q to quit, c to continue without paging-- > > #19 0xffff0000016031c0 in zil_commit_writer (zilog=3D0xffffa00021631800= ,=20 > > zcw=3D0xffffa000fb01e140) at=20 > > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zil.c:2985 > > #20 zil_commit_impl (zilog=3D0xffffa00021631800, foid=3D= ) at=20 > > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zil.c:3489 > > #21 0xffff000001602f44 in zil_commit (zilog=3D,=20 > > foid=3D) at=20 > > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zil.c:3452 > > #22 0xffff0000015fdac8 in zfs_fsync (zp=3D0xffffa007e27b23a0,=20 > > syncflag=3D, cr=3D) at=20 > > /usr/src.kbsd/sys/contrib/openzfs/module/zfs/zfs_vnops.c:75 > > #23 0xffff00000056a864 in VOP_FSYNC (vp=3D0xffffa001d2003000, waitfor= =3D1,=20 > > td=3D0xffff00015aa94620) at ./vnode_if.h:763 > > #24 kern_fsync (td=3D0xffff00015aa94620, fd=3D,=20 > > fullsync=3Dtrue) at /usr/src.kbsd/sys/kern/vfs_syscalls.c:3556 > > #25 0xffff000000800630 in syscallenter (td=3D0xffff00015aa94620) at=20 > > /usr/src.kbsd/sys/arm64/arm64/../../kern/subr_syscall.c:187 > > #26 svc_handler (td=3D0xffff00015aa94620, frame=3D) at=20 > > /usr/src.kbsd/sys/arm64/arm64/trap.c:198 > > #27 do_el0_sync (td=3D0xffff00015aa94620, frame=3D) at=20 > > /usr/src.kbsd/sys/arm64/arm64/trap.c:625 > >=20 > >=20 > > If I pop up to the zil_lwb_write_issue frame: > >=20 > > (kgdb) print *lwb > > $4 =3D {lwb_zilog =3D 0xdeadc0dedeadc0de, lwb_blk =3D {blk_dva =3D {{dv= a_word =3D=20 > > {16045693110842147038, 16045693110842147038}}, {dva_word =3D=20 > > {16045693110842147038, 16045693110842147038}}, {dva_word =3D=20 > > {16045693110842147038, > > 16045693110842147038}}}, blk_prop =3D 16045693110842147038,= =20 > > blk_pad =3D {16045693110842147038, 16045693110842147038}, blk_phys_birt= h =3D=20 > > 16045693110842147038, blk_birth =3D 16045693110842147038, > > blk_fill =3D 16045693110842147038, blk_cksum =3D {zc_word =3D=20 > > {16045693110842147038, 16045693110842147038, 16045693110842147038,=20 > > 16045693110842147038}}}, lwb_slim =3D 3735929054, lwb_slog =3D 37359290= 54,=20 > > lwb_error =3D -559038242, > > lwb_nmax =3D -559038242, lwb_nused =3D -559038242, lwb_nfilled =3D=20 > > -559038242, lwb_sz =3D -559038242, lwb_state =3D 3735929054, lwb_buf = =3D=20 > > 0xdeadc0dedeadc0de > 0xdeadc0dedeadc0de>, =20 > > lwb_child_zio =3D 0xdeadc0dedeadc0de, lwb_write_zio =3D=20 > > 0xdeadc0dedeadc0de, lwb_root_zio =3D 0xdeadc0dedeadc0de,=20 > > lwb_issued_timestamp =3D -2401050962867404578, lwb_issued_txg =3D=20 > > 16045693110842147038, > > lwb_alloc_txg =3D 16045693110842147038, lwb_max_txg =3D=20 > > 16045693110842147038, lwb_node =3D {list_next =3D 0xdeadc0dedeadc0de,=20 > > list_prev =3D 0xdeadc0dedeadc0de}, lwb_issue_node =3D {list_next =3D=20 > > 0xdeadc0dedeadc0de, > > list_prev =3D 0xdeadc0dedeadc0de}, lwb_itxs =3D {list_size =3D=20 > > 16045693110842147038, list_offset =3D 16045693110842147038, list_head = =3D=20 > > {list_next =3D 0xdeadc0dedeadc0de, list_prev =3D 0xdeadc0dedeadc0de}},= =20 > > lwb_waiters =3D { > > list_size =3D 16045693110842147038, list_offset =3D=20 > > 16045693110842147038, list_head =3D {list_next =3D 0xdeadc0dedeadc0de,= =20 > > list_prev =3D 0xdeadc0dedeadc0de}}, lwb_vdev_tree =3D {avl_root =3D=20 > > 0xdeadc0dedeadc0de, > > avl_compar =3D 0xdeadc0dedeadc0de, avl_offset =3D 1604569311084214= 7038,=20 > > avl_numnodes =3D 16045693110842147038}, lwb_vdev_lock =3D {lock_object = =3D { > > lo_name =3D 0xdeadc0dedeadc0de > address 0xdeadc0dedeadc0de>, lo_flags =3D 3735929054, lo_data =3D=20 > > 3735929054, lo_witness =3D 0xdeadc0dedeadc0de}, sx_lock =3D=20 > > 16045693110842147038}} > >=20 > > It seems to have clearly been stomped on by uma trashing. Encountered=20 > > while running a pkgbase build, I think while it was in the packaging=20 > > phase. I note in particular in that frame: > >=20 > > (kgdb) p/x lwb->lwb_issued_timestamp > > $4 =3D 0xdeadc0dedeadc0de > >=20 > > So I guess it was freed sometime during one of the previous two=20 > > zio_nowait() calls. > >=20 > > Thanks, > >=20 > > Kyle Evans > > =20 >=20 > I think the four panics on my two downstairs machines two days ago, the > panic today on the one of the same two machines today, and the email > data corruption my laptop has been experiencing, and the backtraces I've > seen Gleb and you post give me the uneasy sense that there may be some > kind of general memory corruption due to bad or corrupted addresses > being dereferenced. >=20 Though on second thought these could be separate and unrelated problems. The fact that the spin loop that forced me to manually panic my one machine reminds me of a similar situation on a mainframe where page tables were being overwritten by kernel code being developed at the time which also resulted in a spin loop over a corrupted lock. --=20 Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=3D0