[Bug 282622] zfs snappshot corruption when using encryption

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 08 Nov 2024 09:54:21 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=282622

            Bug ID: 282622
           Summary: zfs snappshot corruption when using encryption
           Product: Base System
           Version: 14.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: girgen@FreeBSD.org

Hi!

We se sporadical corruption in snapshots (not really files, it seems) since we
started using encryption on previously well behaved system.

$ sudo zpool status -v tank
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Mon Nov  4 12:49:06 2024
        126T / 128T scanned at 392M/s, 126T / 128T issued at 392M/s
        0B repaired, 99.07% done, 00:53:05 to go
config:

        NAME                          STATE     READ WRITE CKSUM
        tank                          ONLINE       0     0     0
          raidz2-0                    ONLINE       0     0     0
            gpt/ZA17TQRZ0000R7412YQE  ONLINE       0     0     0
            da7                       ONLINE       0     0     0
            gpt/ZA17YK2F0000R7443JLQ  ONLINE       0     0     0
            gpt/ZA17YLK20000R744Z3EQ  ONLINE       0     0     0
            da2                       ONLINE       0     0     0
            gpt/ZA17YMZM0000R741ZUG7  ONLINE       0     0     0
            gpt/ZA17YN7N0000R7426S1K  ONLINE       0     0     0
          raidz2-1                    ONLINE       0     0     0
            gpt/Z4D4GSYQ0000R642L8LN  ONLINE       0     0     0
            gpt/S4D198220000K706NYDF  ONLINE       0     0     0
            gpt/S4D1E37R0000E715QM7B  ONLINE       0     0     0
            gpt/Z4D4J36Y0000R63167HP  ONLINE       0     0     0
            gpt/S4D198GF0000K706NYRP  ONLINE       0     0     0
            gpt/Z4D4J3DX0000R633K23E  ONLINE       0     0     0
          raidz3-3                    ONLINE       0     0     0
            gpt/9RKBX1NL              ONLINE       0     0     0
            gpt/9RKBZ0KL              ONLINE       0     0     0
            gpt/9RKBBPYL              ONLINE       0     0     0
            gpt/9RKBM1DC              ONLINE       0     0     0
            gpt/9RKAW5MC              ONLINE       0     0     0
            gpt/9RKD3LDC              ONLINE       0     0     0
            gpt/9RKD2H3C              ONLINE       0     0     0
            gpt/9RK7XB1C              ONLINE       0     0     0
        logs    
          mirror-2                    ONLINE       0     0     0
            gpt/ZIL1                  ONLINE       0     0     0
            gpt/ZIL2                  ONLINE       0     0     0
        cache
          gpt/L2ARC1                  ONLINE       0     0     0
          gpt/L2ARC2                  ONLINE       0     0     0
        spares
          gpt/ZA17YMCH0000R74264N0    AVAIL   

errors: Permanent errors have been detected in the following files:



No files though, and the last percent of scrub has been working for a few days
now.

We know which snapshot is problematic since we ship them all off site using
`zfs send`. Removing snapshot culprit fixes the problem, but it keeps popping
up. We're planning upgrade to 14.1 this weekend, but I see no closed PRs about
this problem so I'm posting this before the upgrade. Anyone knows if anything
in the encryption code has been improved?

There a description in the OpenZFS github,
https://github.com/openzfs/zfs/issues/12014 , that seems to be spot on. The
issue is still open.

Any ideaa? What more can I supply in terms of data?

-- 
You are receiving this mail because:
You are the assignee for the bug.