From nobody Sat Nov 04 17:35:32 2023 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SN4Xz2K4Dz4yym2 for ; Sat, 4 Nov 2023 17:36:35 +0000 (UTC) (envelope-from Alexander@Leidinger.net) Received: from mailgate.Leidinger.net (mailgate.leidinger.net [IPv6:2a00:1828:2000:313::1:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature ECDSA (P-256) client-digest SHA256) (Client CN "mailgate.leidinger.net", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SN4Xx3TN4z3SfY; Sat, 4 Nov 2023 17:36:33 +0000 (UTC) (envelope-from Alexander@Leidinger.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=leidinger.net header.s=outgoing-alex header.b=GXCejtz7; spf=pass (mx1.freebsd.org: domain of Alexander@Leidinger.net designates 2a00:1828:2000:313::1:5 as permitted sender) smtp.mailfrom=Alexander@Leidinger.net; dmarc=pass (policy=quarantine) header.from=leidinger.net List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leidinger.net; s=outgoing-alex; t=1699119379; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=tMR/lAdZI0U5GT68UfQoWUXEsexJn6VZgSY96+oJioU=; b=GXCejtz7oP3rJw0cNAkdbgKMNXTtOLthEsI35Dr3YCYLcr7gG+DDDaB7uV/ksSVF1/dVcy o+/uiLnGj8XtYFD5A8RkW8INez3gLA+NKdqGBFCbRAm3BF1tjjeHDkKX9sfhCM+tjv8tT2 GtTaDXPVANGNAObRyHIZpcu430mw1fAjQrnmOnrZJ2ZzQRrGE1ggQ6nMpK4l6TcKjl8ok3 IiI5z8PtiAwmTR93Sr/+bbm86+itm+SnT+K+t5yse9d5bwPoC6HEA5zrbD9rZ5T23HbVRO uD7vWOkmQpSd9XQxD+KD3QP7uu25nbraExF0FpGty0ZM6FGfht+BXe4O/cBypQ== Date: Sat, 04 Nov 2023 18:35:32 +0100 From: Alexander Leidinger To: mm@freebsd.org Cc: John F Carr , freebsd-fs@freebsd.org Subject: Re: ZFS txg rollback: expected timeframe? In-Reply-To: References: <18B0B6B6-9237-42D0-9FB2-D55CE72E1CCA@mit.edu> Message-ID: X-Sender: Alexander@Leidinger.net Organization: No organization, this is a private message. Content-Type: multipart/signed; protocol="application/pgp-signature"; boundary="=_4b2544ccaa684e053d024cb13f060892"; micalg=pgp-sha256 X-Spamd-Result: default: False [-5.05 / 15.00]; SIGNED_PGP(-2.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.95)[-0.954]; DMARC_POLICY_ALLOW(-0.50)[leidinger.net,quarantine]; MIME_GOOD(-0.20)[multipart/signed,multipart/alternative,text/plain]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[leidinger.net:s=outgoing-alex]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; MIME_TRACE(0.00)[0:+,1:+,2:+,3:~,4:~]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; ASN(0.00)[asn:34240, ipnet:2a00:1828::/32, country:DE]; RCPT_COUNT_THREE(0.00)[3]; FROM_HAS_DN(0.00)[]; DKIM_TRACE(0.00)[leidinger.net:+]; TO_DN_SOME(0.00)[]; HAS_ORG_HEADER(0.00)[]; HAS_ATTACHMENT(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4SN4Xx3TN4z3SfY X-Spamd-Bar: ----- This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --=_4b2544ccaa684e053d024cb13f060892 Content-Type: multipart/alternative; boundary="=_aa60ead52848b6faf5ce93df29d0a9f0" --=_aa60ead52848b6faf5ce93df29d0a9f0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8; format=flowed Am 2023-10-31 15:27, schrieb Alexander Leidinger: > On Tue, Oct 31, 2023 at 1:15 PM John F Carr wrote: > >>> On Oct 31, 2023, at 06:16, Alexander Leidinger >>> wrote: >>> >>> Issue: a overheating CPU may have corrupted a zpool (4 * 4TB in >>> raidz2 setup) in a way that a normal import of the pool panics the >>> machine with "VERIFY3(l->blk_birth == r->blk_birth) failed (101867360 >>> == 101867222)". >>> >> >> I disabled that assertion because it gives a false alarm with some >> combinaion >> of deduplication, cloning, and snapshotting on one of my systems. >> >> See >> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538 >> https://github.com/openzfs/zfs/issues/11480 > > I don't have deduplication on this pool. There are clones, and > snapshots, and there could be recent ones if poudriere does some. Is it > still a false alarm in this case? If yes, you say a kernel with this > patch applied should let me import the pool without rollback? > > The github issue is from 2022, I have my doubts that this is the same > issue we see. I rather expect some issues around the copy_file_range(2) > related code for ZFS which was re-enabled 20 days ago (maybe it is > valid to remove this assert, or maybe the block cloning part needs some > tweak). CC Martin for the block cloning part. So in the end I was at least able to import the pool read-only with the patch to disable this VERIFY3 panic. After a final incremental backup I re-created the pool (with vfs.zfs.bclone_enabled=0) and restored all datasets. Now some checks (this VERIFY3 part is enabled again) and then a full backup. There is still the question what caused it. With the above report from John about some issues when dedup is enabled (which wasn't on this pool until the default of bclone_enabled was changed to 1, which is some kind of dedup internal to ZFS as far as I was understanding the description of block cloning in the openzfs ticket of block cloning) I have some reservations about enabling it again. Martin, maybe it's a good idea to disable block cloning again, until someone with the corresponding OpenZFS knowledge has investigated this... Bye, Alexander. -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_aa60ead52848b6faf5ce93df29d0a9f0 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8

Am 2023-10-31 15:27, schrieb Alexander Leidinger:

On Tue, Oct 31, 2023 at 1:15=E2=80=AFPM John F Carr <jfc@mit.edu> wrote:<= /div>


> On Oct 31= , 2023, at 06:16, Alexander Leidinger <alexleidingerde@gmail.com> wrote:
>
> Issue: a overheating CPU may have corrupted a zpool (4 * = 4TB in raidz2 setup) in a way that a normal import of the pool panics the m= achine with "VERIFY3(l->blk_birth =3D=3D r->blk_birth) failed (101867= 360 =3D=3D 101867222)".
>

I disabled that assertion bec= ause it gives a false alarm with some combinaion
of deduplication, clo= ning, and snapshotting on one of my systems.

See

&nbs= p;https://bugs.freebsd.org/bugzil= la/show_bug.cgi?id=3D261538
 https= ://github.com/openzfs/zfs/issues/11480
 
I don't have deduplication on this pool. There are clones, and snapsho= ts, and there could be recent ones if poudriere does some. Is it still= a false alarm in this case? If yes, you say a kernel with this patch appli= ed should let me import the pool without rollback?
 
The github issue is from 2022, I have my doubts that this is the same = issue we see. I rather expect some issues around the copy_file_range(2) rel= ated code for ZFS which was re-enabled 20 days ago (maybe it is valid to re= move this assert, or maybe the block cloning part needs some tweak). CC Mar= tin for the block cloning part.

So in the end I was at least able to import the pool read-only with the = patch to disable this VERIFY3 panic. After a final incremental backup I re-= created the pool (with vfs.zfs.bclone_enabled=3D0) and restored all dataset= s. Now some checks (this VERIFY3 part is enabled again) and then a full bac= kup.

There is still the question what caused it. With the above report from J= ohn about some issues when dedup is enabled (which wasn't on this pool unti= l the default of bclone_enabled was changed to 1, which is some kind of ded= up internal to ZFS as far as I was understanding the description of block c= loning in the openzfs ticket of block cloning) I have some reservations abo= ut enabling it again.

Martin, maybe it's a good idea to disable block cloning again, until som= eone with the corresponding OpenZFS knowledge has investigated this...

Bye,
Alexander.

--
--=_aa60ead52848b6faf5ce93df29d0a9f0-- --=_4b2544ccaa684e053d024cb13f060892 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc; size=833 Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEER9UlYXp1PSd08nWXEg2wmwP42IYFAmVGgPEACgkQEg2wmwP4 2IYAAA//TZInMtjfWk2931KSHTlTJZVSHxwlE0y9mV4kr0ood/JRp5R5hfB75f7k qzISrULuPsnf4hTHATlgoM1O3fCzmNnc8WqOwZh0fIY7yTvtPbnEXR1W1yC6bC2u CgVXN3eVdTPG+gwI/AxvYoKOj9ykV/ArbSA5oj4Tt6bIsInpCSKwG9wmUdWSU/v4 tRPLkA0vRM26PyAXj7X5LwHGZVZeR7xD1iOvdejMAsYiXgV7/T2X8PQPA7N4T1qW ZQIeFWB7XwRbg//egh5HsI0brdPYnxOcT/J+ebUV//O+/hEGKnPdVLMiqbVz7ZLz wZTNf3eu0h8WMI0Gq+Sg3xwxEAELivUGBpl5TKeaPwidxyqgJ55u9s0GZCKUALh2 S5itoP/SImPZKf9vu5JG3ma+U1Ui90+CU0MGmifJ89o1kZR+kZrRkjQQJHdmuCqa /3rwuaDAtpKaX7qRxHvLR784s28LlzHRqxllV1dcck8IsQH/tVgQXq4riiQlrCLQ dE8+TyDD+sNHfYPHJNoLHaHvudJ6NaXCkjvjTnAHt1OEuoCc6qKFVDTIdOiIZ4Ov 3l8BGECnaa1Zry+O0ZDQH+985xPM8MQQ7Rfe1gJ0b+KRFNgUUQoSwbgBZn7HL+ep YpXwiG9WmmsEemi3UkAdnT3C6iAXlY9RVY1qZoWKRVRviLuhGck= =Xm+x -----END PGP SIGNATURE----- --=_4b2544ccaa684e053d024cb13f060892--