Re: CURRENT: Panic VERIFY(!zil_replaying(zilog, tx)) failed (and crashing)
Date: Tue, 11 Apr 2023 14:47:13 UTC
In message <20230411142831.DB8245FA@slippy.cwsent.com>, Cy Schubert writes: > In message <434B83DB-F6BB-436F-8AA5-385730D20BB1@dawidek.net>, > =?utf-8?Q?Pawe=C > 5=82_Jakub_Dawidek?= writes: > > > > > > > On Apr 11, 2023, at 11:31, Cy Schubert <Cy.Schubert@cschubert.com> wrote: > > >=20 > > > =EF=BB=BFIn message <20230409161436.5412fa6e@thor.intern.walstatt.dynvpn. > d= > > e>,=20 > > > FreeBSD Us > > > er writes: > > >> Am Sun, 9 Apr 2023 14:37:03 +0200 > > >> Mateusz Guzik <mjguzik@gmail.com> schrieb: > > >>=20 > > >>>> On 4/9/23, FreeBSD User <freebsd@walstatt-de.de> wrote: > > >>>>> Today, after upgrading to FreeBSD 14.0-CURRENT #8 main-n262052-0d4038 > e= > > 301 > > >>> 2b: > > >>>>> Sun Apr 9 > > >>>>> 12:01:02 CEST 2023 amd64, AND upgrading ZPOOLs via > > >>>>>=20 > > >>>>> zpool upgrade POOLNAME > > >>>>>=20 > > >>>>> some boxes keep crashing when starting compiler runs (the trigger is > > >>>>> different on boxes). > > >>>>>=20 > > >>>>> ZFS module is statically compiled into the kernel (if this is of > > >>>>> importance) > > >>>>>=20 > > >>>>> Last known good was: > > >>>>>=20 > > >>>>> [...] > > >>>>> Apr 9 07:10:04 <0.2> thor kernel: FreeBSD 14.0-CURRENT #7 > > >>>>> main-n262051-75379ea2e461: Sun Apr > > >>>>> 9 00:12:57 CEST 2023 Apr 9 07:10:04 <0.2> thor kernel: > > >>>>> root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR amd64 Apr 9 07:10:04 > < > > = > > 0. > > >>> 2> > > >>>>> thor kernel: > > >>>>> FreeBSD clang version 15.0.7 (https://github.com/llvm/llvm-project.gi > t= > > > > >>>>> llvmorg-15.0.7-0-g8dfdcc7b7bf6) Apr 9 07:10:04 <0.2> thor kernel: > > >>>>> VT(efifb): resolution > > >>>>> 2560x1440 Apr 9 07:10:04 <0.2> thor kernel: module zfsctrl already > > >>>>> present! > > >>>>> [...] > > >>>>>=20 > > >>>>> The file /var/crash/info.X > > >>>>>=20 > > >>>>> contains: > > >>>>>=20 > > >>>>> [...] > > >>>>>=20 > > >>>>> root@thor:/var/crash # more info.2 > > >>>>> Dump header from device: /dev/gpt/swap > > >>>>> Architecture: amd64 > > >>>>> Architecture Version: 2 > > >>>>> Dump Length: 1095192576 > > >>>>> Blocksize: 512 > > >>>>> Compression: none > > >>>>> Dumptime: 2023-04-09 11:43:41 +0000 > > >>>>> Hostname: thor.local > > >>>>> Magic: FreeBSD Kernel Dump > > >>>>> Version String: FreeBSD 14.0-CURRENT #8 main-n262052-0d4038e3012b: S > u= > > n=20 > > >>> Apr > > >>>>> 9 12:01:02 CEST > > >>>>> 2023 > > >>>>> root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR > > >>>>> Panic String: VERIFY(!zil_replaying(zilog, tx)) failed > > >>>>>=20 > > >>>>> Dump Parity: 2961465682 > > >>>>> Bounds: 2 > > >>>>> Dump Status: good > > >>>>>=20 > > >>>>> Until reconfigured for more debug stuff I do not have more to present > .= > > > > >>>>>=20 > > >>>>> I rememeber now really scraed that there was a HEADSUP in the list re > g= > > ard > > >>> ing > > >>>>> some serious ZFS > > >>>>> problems - I didn't find it right now. > > >>>>>=20 > > >>>>> Thanks in advance, > > >>>>>=20 > > >>>=20 > > >>> That's fallout from the new block cloning feature, adding the author > > >>>=20 > > >>=20 > > >> Thanks. > > >>=20 > > >> As of this moment, all systems with the newest kernel and the new ZFS op > t= > > ion=20 > > >> enabled, crash - > > >> the reason is mostly in different ZFS datasets. I guess there is no way > b > > = > > ack > > >> once this faulty > > >> option is enabled? > > >=20 > > > I've run a test on a scratch pool here, first without block_cloning=20 > > > enabled, then with. There was no corruption when block_cloning was=20 > > > disabled. There was corruption when block_cloning was enabled. > > >=20 > > > I don't know of any way to revert back nor is there any way to fix or=20 > > > recover the corrupted blocks. > > > > Is the corruption still present after EXDEV fixes? > > Yes and no. > > Yes, there is corruption when block_cloning is enabled. > > There is no corruption when block_cloning is disabled. I should add some detail to this. The corruption experienced when block cloning is disabled was fixed by: - eb1feadc201a - e2d997d1cbb9 - d012836fb616 (specifically this commit) - 20be1b4fc4b7 When block_cloning is enabled, the pool is corrupted. This has not been fixed. -- Cheers, Cy Schubert <Cy.Schubert@cschubert.com> FreeBSD UNIX: <cy@FreeBSD.org> Web: https://FreeBSD.org NTP: <cy@nwtime.org> Web: https://nwtime.org e^(i*pi)+1=0