ZFS errors on the array but not the disk.
Alan Somers
asomers at freebsd.org
Fri Oct 24 15:33:25 UTC 2014
On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox <zbeeble at gmail.com> wrote:
> What does it mean when checksum errors appear on the array (and the vdev)
> but not on any of the disks? See the paste below. One would think that
> there isn't some ephemeral data stored somewhere that is not one of the
> disks, yet "cksum" errors show only on the vdev and the array lines. Help?
>
> [2:17:316]root at virtual:/vr2/torrent/in> zpool status
> pool: vr2
> state: ONLINE
> status: One or more devices is currently being resilvered. The pool will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scan: resilver in progress since Thu Oct 23 23:11:29 2014
> 1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go
> 119G resilvered, 6.79% done
> config:
>
> NAME STATE READ WRITE CKSUM
> vr2 ONLINE 0 0 36
> raidz1-0 ONLINE 0 0 72
> label/vr2-d0 ONLINE 0 0 0
> label/vr2-d1 ONLINE 0 0 0
> gpt/vr2-d2c ONLINE 0 0 0 block size: 512B
> configured, 4096B native (resilvering)
> gpt/vr2-d3b ONLINE 0 0 0 block size: 512B
> configured, 4096B native
> gpt/vr2-d4a ONLINE 0 0 0 block size: 512B
> configured, 4096B native
> ada14 ONLINE 0 0 0
> label/vr2-d6 ONLINE 0 0 0
> label/vr2-d7c ONLINE 0 0 0
> label/vr2-d8 ONLINE 0 0 0
> raidz1-1 ONLINE 0 0 0
> gpt/vr2-e0 ONLINE 0 0 0 block size: 512B
> configured, 4096B native
> gpt/vr2-e1 ONLINE 0 0 0 block size: 512B
> configured, 4096B native
> gpt/vr2-e2 ONLINE 0 0 0 block size: 512B
> configured, 4096B native
> gpt/vr2-e3 ONLINE 0 0 0
> gpt/vr2-e4 ONLINE 0 0 0 block size: 512B
> configured, 4096B native
> gpt/vr2-e5 ONLINE 0 0 0 block size: 512B
> configured, 4096B native
> gpt/vr2-e6 ONLINE 0 0 0 block size: 512B
> configured, 4096B native
> gpt/vr2-e7 ONLINE 0 0 0 block size: 512B
> configured, 4096B native
>
> errors: 43 data errors, use '-v' for a list
The checksum errors will appear on the raidz vdev instead of a leaf if
vdev_raidz.c can't determine which leaf vdev was responsible. This
could happen if two or more leaf vdevs return bad data for the same
block, which would also lead to unrecoverable data errors. I see that
you have some unrecoverable data errors, so maybe that's what happened
to you.
Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable
to determine which child was responsible for a checksum error.
However, I've only seen that happen when a raidz vdev has a mirror
child. That can only happen if the child is a spare or replacing
vdev. Did you activate any spares, or did you manually replace a
vdev?
-Alan
More information about the freebsd-fs
mailing list