From nobody Tue Oct 31 12:10:07 2023 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SKTVQ5QFvz4yJ5f for ; Tue, 31 Oct 2023 12:10:22 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SKTVQ1CjMz4YMd for ; Tue, 31 Oct 2023 12:10:22 +0000 (UTC) (envelope-from rincebrain@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-40850b244beso43225985e9.2 for ; Tue, 31 Oct 2023 05:10:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698754219; x=1699359019; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=jKjABxk1a5Ld7Acm0UzbHdcbmdwNdnMPDXUVpZ50RHo=; b=MTUKRdc+Bvd4/lXOJWH6OXTPqKGrniFeabjfr77kqonzaKwLTMY+AkS3YOiZmeU4kg Ou9XcQMJh3r9BiEyhetFdhus9wf62Iwr+Ldchm8krGMgvdebJoxv5ZlI3x5B9WIuO1H6 ZZKWtpaWcRt+y3DXOyYNeM2lndvw2BNkZYXI4MfkuSQB9hLpjbBANdZgzeng+tvUt3f/ 8x3lrTLBg692oDet7l7O3vXqtLVyOxHggkrjk4qo7qzBg8Y+1bIDTHwDrxNtCFgdg9pw 3I5K3LQ+OX8Fit5HljKjqeoGXXy6k2u9bMlGHLQmBqnn7atgSvIu/L6a8mjjAR3/+jBX 7l6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698754219; x=1699359019; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jKjABxk1a5Ld7Acm0UzbHdcbmdwNdnMPDXUVpZ50RHo=; b=gjnVrN4gKElgvr11Gwho8PSYTXy4nqS52DjxKGl8Lf0msC4cUdEtJhChtEtx1H9EZy jGJASqAY02RA5wGn+79Wu1UnvGUnZEEBrDCkCrmyCGnvtJ+vf6n+vA+QK/4ixsxjR0Kf 2pIZIu8t+SDlcOkoEfj1S3NTBqZOvMokwuDlmSuwhEi7UtTDXCjsMuLTlgTU4hkYoo2x nJgRGy9rRjfip0oTF3EjItIaU89Cq4GJU5gqIZMPjLb2i+n80t3HVxBcxOhoH63Rv55w w85FTpmkbytqMs0pUBO4Hjqwu/tQBHYyJ8ZYs4pDkAu7yq39uOvZ3M52BlM3uC0gFV86 9r2w== X-Gm-Message-State: AOJu0YyiEf+cdypCYGWw+N9kg+U9PinUMqpddpQgT3f+ON1cTWP9EFuY TyWsoIp+jmMW+z7YsPHHM/DcgISW+D2cbkODlJM= X-Google-Smtp-Source: AGHT+IHLehR0CQKZ42Kcd0m/+L7HRqBjvvFHX48n2ESXs3sS7D0xaFXPBvi+oD49wyyyeHZVwyLVW4YHbRU6Q3pO9xk= X-Received: by 2002:a05:600c:1988:b0:409:19a0:d26f with SMTP id t8-20020a05600c198800b0040919a0d26fmr9913082wmq.23.1698754219119; Tue, 31 Oct 2023 05:10:19 -0700 (PDT) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Rich Date: Tue, 31 Oct 2023 08:10:07 -0400 Message-ID: Subject: Re: ZFS txg rollback: expected timeframe? To: Alexander Leidinger Cc: freebsd-fs@freebsd.org Content-Type: multipart/alternative; boundary="000000000000ca9a340609020944" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4SKTVQ1CjMz4YMd --000000000000ca9a340609020944 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable -T is documented. It's going to take a while by default because it tries to walk every block in the pool and verify it passes checksum before the import succeeds. There's tunables for changing that if you don't mind finding they're sad later - spa_load_verify_metadata and spa_load_verify_data. On Tue, Oct 31, 2023 at 7:36=E2=80=AFAM Alexander Leidinger < alexleidingerde@gmail.com> wrote: > Hi, > > yes, the answer to $Subject is hard. I know. > > Issue: a overheating CPU may have corrupted a zpool (4 * 4TB in raidz2 > setup) in a way that a normal import of the pool panics the machine with > "VERIFY3(l->blk_birth =3D=3D r->blk_birth) failed (101867360 =3D=3D 10186= 7222)". > > There are backups, but a zpoool import with "-N -F -T xxx" should work to= o > and remove the need to restore from full backup (via USB) + incremental > (from S3/tarsnap). > > During the crash there was a poudriere run of maybe 15 ports active > (including qt6-), ccache is in use for this. The rest (in > amounts of data) is just little stuff. > > What is the expectation of the runtime on 5k rpm spinning rust (WD red)? > So far all the disks are at 100% (gstat) since about 26h. > > On a related note: > Is there a reason why "-T" is not documented? > After a while I get "Panic String: deadlres_td_sleep_q: possible deadlock > detected for 0xfffffe023ae831e0 (l2arc_feed_thread), blocked for 1802817 > ticks" during such an import and I had to set > debug.deadlkres.blktime_threshold: 1215752191 > debug.deadlkres.slptime_threshold: 1215752191 > Setting vfs.zfs.deadman.enabled=3D0 didn't help (it's still set). > Is there something more wrong with my pool than expected, or is this some > kind of a bug that such an import is triggering this panic? > The SSDs with l2arc and ZIL don't show up at all in the gstat, and I don'= t > expect them to show up on an import with rollback to a previous txg, as > such I was surprised to see such a panic. > > Bye, > Alexander. > --000000000000ca9a340609020944 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
-T is documented.

It's going to tak= e a while by default because it tries to walk every block in the pool and v= erify it passes checksum before the import succeeds. There's tunables f= or changing that if you don't mind finding they're sad later -=C2= =A0spa_load_verify_metadata and=C2=A0spa_load_verify_data.

<= div class=3D"gmail_quote">
On Tue, Oct= 31, 2023 at 7:36=E2=80=AFAM Alexander Leidinger <alexleidingerde@gmail.com> wrote:
Hi,

yes, the answer to $Subject is hard. I know.
<= div>
Issue: a overheating CPU may have corrupted a zpool (4 *= 4TB in raidz2 setup) in a way that a normal import of the pool panics the = machine with "VERIFY3(l->blk_birth =3D=3D r->blk_birth) failed (= 101867360 =3D=3D 101867222)".

There are backu= ps, but a zpoool import with "-N -F -T xxx" should work too and r= emove the need to restore from full backup (via=C2=A0 USB) + incremental (f= rom S3/tarsnap).

During the crash there was a poud= riere run of maybe 15 ports active (including qt6-<web-something>), c= cache is in use for this. The rest (in amounts of data) is just little stuf= f.

What is the expectation of the runtime on 5k rp= m spinning rust (WD red)? So far all the disks are at 100% (gstat) since ab= out 26h.

On a related note:
Is there a r= eason why "-T" is not documented?
After a while I get &= quot;Panic String: deadlres_td_sleep_q: possible deadlock detected for 0xff= fffe023ae831e0 (l2arc_feed_thread), blocked for 1802817 ticks" during = such an import and I had to set
debug.= deadlkres.blktime_threshold: 1215752191
debug.deadlkres.slptime_threshol= d: 1215752191
Setting vfs.zfs.deadman.enabled=3D0 didn't = help (it's still set).
Is there something more wrong with my = pool than expected, or is this some kind of a bug that such an import is tr= iggering this panic?
The SSDs with l2arc and ZIL don't show u= p at all in the gstat, and I don't expect them to show up on an import = with rollback to a previous txg, as such I was surprised to see such a pani= c.

Bye,
Alexander.
--000000000000ca9a340609020944--