a strange and terrible saga of the cursed iSCSI ZFS SAN
Fabian Keil
freebsd-listen at fabiankeil.de
Sat Aug 5 17:52:36 UTC 2017
"Eugene M. Zheganin" <emz at norma.perm.ru> wrote:
> On 05.08.2017 22:08, Eugene M. Zheganin wrote:
> >
> > pool: userdata
> > state: ONLINE
> > status: One or more devices has experienced an error resulting in data
> > corruption. Applications may be affected.
> > action: Restore the file in question if possible. Otherwise restore the
> > entire pool from backup.
> > see: http://illumos.org/msg/ZFS-8000-8A
> > scan: none requested
> > config:
> >
> > NAME STATE READ WRITE CKSUM
> > userdata ONLINE 0 0 216K
> > mirror-0 ONLINE 0 0 432K
> > gpt/userdata0 ONLINE 0 0 432K
> > gpt/userdata1 ONLINE 0 0 432K
> That would be funny, if not that sad, but while writing this message,
> the pool started to look like below (I just asked zpool status twice in
> a row, comparing to what it was):
>
> [root at san1:~]# zpool status userdata
> pool: userdata
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://illumos.org/msg/ZFS-8000-8A
> scan: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> userdata ONLINE 0 0 728K
> mirror-0 ONLINE 0 0 1,42M
> gpt/userdata0 ONLINE 0 0 1,42M
> gpt/userdata1 ONLINE 0 0 1,42M
>
> errors: 4 data errors, use '-v' for a list
> [root at san1:~]# zpool status userdata
> pool: userdata
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://illumos.org/msg/ZFS-8000-8A
> scan: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> userdata ONLINE 0 0 730K
> mirror-0 ONLINE 0 0 1,43M
> gpt/userdata0 ONLINE 0 0 1,43M
> gpt/userdata1 ONLINE 0 0 1,43M
>
> errors: 4 data errors, use '-v' for a list
>
> So, you see, the error rate is like speed of light. And I'm not sure if
> the data access rate is that enormous, looks like they are increasing on
> their own.
> So may be someone have an idea on what this really means.
Quoting a comment from sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:
/*
* If destroy encounters an EIO while reading metadata (e.g. indirect
* blocks), space referenced by the missing metadata can not be freed.
* Normally this causes the background destroy to become "stalled", as
* it is unable to make forward progress. While in this stalled state,
* all remaining space to free from the error-encountering filesystem is
* "temporarily leaked". Set this flag to cause it to ignore the EIO,
* permanently leak the space from indirect blocks that can not be read,
* and continue to free everything else that it can.
*
* The default, "stalling" behavior is useful if the storage partially
* fails (i.e. some but not all i/os fail), and then later recovers. In
* this case, we will be able to continue pool operations while it is
* partially failed, and when it recovers, we can continue to free the
* space, with no leaks. However, note that this case is actually
* fairly rare.
*
* Typically pools either (a) fail completely (but perhaps temporarily,
* e.g. a top-level vdev going offline), or (b) have localized,
* permanent errors (e.g. disk returns the wrong data due to bit flip or
* firmware bug). In case (a), this setting does not matter because the
* pool will be suspended and the sync thread will not be able to make
* forward progress regardless. In case (b), because the error is
* permanent, the best we can do is leak the minimum amount of space,
* which is what setting this flag will do. Therefore, it is reasonable
* for this flag to normally be set, but we chose the more conservative
* approach of not setting it, so that there is no possibility of
* leaking space in the "partial temporary" failure case.
*/
In FreeBSD the "flag" currently isn't easily reachable due to the lack
of a powerful kernel debugger (like mdb in Solaris offsprings) but
it can be made reachable with a sysctl using the patch from:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218954
Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20170805/499f28b3/attachment.sig>
More information about the freebsd-fs
mailing list