Re: CURRENT: Panic VERIFY(!zil_replaying(zilog, tx)) failed (and crashing)

From: Nuno Teixeira <eduardo_at_freebsd.org>
Date: Wed, 12 Apr 2023 12:57:02 UTC
Hello all,

at current 3fdb40d1befe after `zfs upgrade XXX`:

same problem when running compiler:

- poudriere: crash without dump
- make buildworld (/usr/src): shutdown -p (I will try to get a photo)

Is there a way to disable block clone?

Cy Schubert <Cy.Schubert@cschubert.com> escreveu no dia terça, 11/04/2023
à(s) 15:47:

> In message <20230411142831.DB8245FA@slippy.cwsent.com>, Cy Schubert
> writes:
> > In message <434B83DB-F6BB-436F-8AA5-385730D20BB1@dawidek.net>,
> > =?utf-8?Q?Pawe=C
> > 5=82_Jakub_Dawidek?= writes:
> > >
> > >
> > > > On Apr 11, 2023, at 11:31, Cy Schubert <Cy.Schubert@cschubert.com>
> wrote:
> > > >=20
> > > > =EF=BB=BFIn message
> <20230409161436.5412fa6e@thor.intern.walstatt.dynvpn.
> > d=
> > > e>,=20
> > > > FreeBSD Us
> > > > er writes:
> > > >> Am Sun, 9 Apr 2023 14:37:03 +0200
> > > >> Mateusz Guzik <mjguzik@gmail.com> schrieb:
> > > >>=20
> > > >>>> On 4/9/23, FreeBSD User <freebsd@walstatt-de.de> wrote:
> > > >>>>> Today, after upgrading to FreeBSD 14.0-CURRENT #8
> main-n262052-0d4038
> > e=
> > > 301
> > > >>> 2b:
> > > >>>>> Sun Apr  9
> > > >>>>> 12:01:02 CEST 2023  amd64, AND upgrading ZPOOLs via
> > > >>>>>=20
> > > >>>>> zpool upgrade POOLNAME
> > > >>>>>=20
> > > >>>>> some boxes keep crashing when starting compiler runs (the
> trigger is
> > > >>>>> different on boxes).
> > > >>>>>=20
> > > >>>>> ZFS module is statically compiled into the kernel (if this is of
> > > >>>>> importance)
> > > >>>>>=20
> > > >>>>> Last known good was:
> > > >>>>>=20
> > > >>>>> [...]
> > > >>>>> Apr  9 07:10:04 <0.2> thor kernel: FreeBSD 14.0-CURRENT #7
> > > >>>>> main-n262051-75379ea2e461: Sun Apr
> > > >>>>> 9 00:12:57 CEST 2023 Apr  9 07:10:04 <0.2> thor kernel:
> > > >>>>> root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR amd64 Apr  9
> 07:10:04
> >  <
> > > =
> > > 0.
> > > >>> 2>
> > > >>>>> thor kernel:
> > > >>>>> FreeBSD clang version 15.0.7 (
> https://github.com/llvm/llvm-project.gi
> > t=
> > >
> > > >>>>> llvmorg-15.0.7-0-g8dfdcc7b7bf6) Apr  9 07:10:04 <0.2> thor
> kernel:
> > > >>>>> VT(efifb): resolution
> > > >>>>> 2560x1440 Apr  9 07:10:04 <0.2> thor kernel: module zfsctrl
> already
> > > >>>>> present!
> > > >>>>> [...]
> > > >>>>>=20
> > > >>>>> The file /var/crash/info.X
> > > >>>>>=20
> > > >>>>> contains:
> > > >>>>>=20
> > > >>>>> [...]
> > > >>>>>=20
> > > >>>>> root@thor:/var/crash # more info.2
> > > >>>>> Dump header from device: /dev/gpt/swap
> > > >>>>>  Architecture: amd64
> > > >>>>>  Architecture Version: 2
> > > >>>>>  Dump Length: 1095192576
> > > >>>>>  Blocksize: 512
> > > >>>>>  Compression: none
> > > >>>>>  Dumptime: 2023-04-09 11:43:41 +0000
> > > >>>>>  Hostname: thor.local
> > > >>>>>  Magic: FreeBSD Kernel Dump
> > > >>>>>  Version String: FreeBSD 14.0-CURRENT #8
> main-n262052-0d4038e3012b: S
> > u=
> > > n=20
> > > >>> Apr
> > > >>>>> 9 12:01:02 CEST
> > > >>>>> 2023
> > > >>>>>    root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR
> > > >>>>>  Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
> > > >>>>>=20
> > > >>>>>  Dump Parity: 2961465682
> > > >>>>>  Bounds: 2
> > > >>>>>  Dump Status: good
> > > >>>>>=20
> > > >>>>> Until reconfigured for more debug stuff I do not have more to
> present
> > .=
> > >
> > > >>>>>=20
> > > >>>>> I rememeber now really scraed that there was a HEADSUP in the
> list re
> > g=
> > > ard
> > > >>> ing
> > > >>>>> some serious ZFS
> > > >>>>> problems - I didn't find it right now.
> > > >>>>>=20
> > > >>>>> Thanks in advance,
> > > >>>>>=20
> > > >>>=20
> > > >>> That's fallout from the new block cloning feature, adding the
> author
> > > >>>=20
> > > >>=20
> > > >> Thanks.
> > > >>=20
> > > >> As of this moment, all systems with the newest kernel and the new
> ZFS op
> > t=
> > > ion=20
> > > >> enabled, crash -
> > > >> the reason is mostly in  different ZFS datasets. I guess there is
> no way
> >  b
> > > =
> > > ack
> > > >> once this faulty
> > > >> option is enabled?
> > > >=20
> > > > I've run a test on a scratch pool here, first without
> block_cloning=20
> > > > enabled, then with. There was no corruption when block_cloning was=20
> > > > disabled. There was corruption when block_cloning was enabled.
> > > >=20
> > > > I don't know of any way to revert back nor is there any way to fix
> or=20
> > > > recover the corrupted blocks.
> > >
> > > Is the corruption still present after EXDEV fixes?
> >
> > Yes and no.
> >
> > Yes, there is corruption when block_cloning is enabled.
> >
> > There is no corruption when block_cloning is disabled.
>
> I should add some detail to this.
>
> The corruption experienced when block cloning is disabled was fixed by:
>
> - eb1feadc201a
> - e2d997d1cbb9
> - d012836fb616 (specifically this commit)
> - 20be1b4fc4b7
>
> When block_cloning is enabled, the pool is corrupted. This has not been
> fixed.
>
>
> --
> Cheers,
> Cy Schubert <Cy.Schubert@cschubert.com>
> FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
> NTP:           <cy@nwtime.org>    Web:  https://nwtime.org
>
>                         e^(i*pi)+1=0
>
>
>
>

-- 
Nuno Teixeira
FreeBSD Committer (ports)