Re: zfs snapshot corruption when using encryption

From: void <void_at_f-m.fm>
Date: Fri, 08 Nov 2024 19:20:58 UTC
On Fri, Nov 08, 2024 at 03:29:22PM +0100, Palle Girgensohn wrote:

>I cannot run `zfs send -I fs@previous_snap fs@problematic_snap`, I get
>warning: cannot send 's@problematic_snap': Input/output error`
>Removing the snapshot fixes the problem.

What an odd problem. Like, it can write but not read. What zfs version?
On 13.3 I'm using:
zfs-2.1.14-FreeBSD_gd99134be8
zfs-kmod-2.1.14-FreeBSD_gd99134be8

on 14-stable:
zfs-2.2.6-FreeBSD_g33174af15
zfs-kmod-2.2.6-FreeBSD_g33174af15

on 15-current:
zfs-2.3.99-31-FreeBSD_gb2f6de7b5
zfs-kmod-2.3.99-31-FreeBSD_gb2f6de7b5

Is the encryption you're using is the GELI
based whole-disk one (which has been around iirc for a few years) or
the relatively recent zfs encryption that works per-filesystem. This is 
why I'm asking the zfs version. I thought the latter is relatively 
quite new, and I've never heard of it working on 14.0.

The way I'd go about trying to address the issue would be to start from 
the lowest layer and work up. On a raidz2 array (the card in JBOD mode),
I can identify all the drives as da0-7, so can use smartctl -x to query 
them all directly, looking for, in particular:

Reallocated_Sector_Ct
Reported_Uncorrect
Current_Pending_Sector
Offline_Uncorrectable

If your disk arrangement is connected via something like HP Smart Array
or similar, you'll need to look at the man page for smartctl for the exact
syntax to query the disks behind the card.
--