a strange and terrible saga of the cursed iSCSI ZFS SAN
Peter
pmc at citylink.dinoex.sub.org
Sun Aug 6 01:13:23 UTC 2017
Eugene M. Zheganin wrote:
> Hi,
>
> On 05.08.2017 22:08, Eugene M. Zheganin wrote:
>>
>> pool: userdata
>> state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>> corruption. Applications may be affected.
>> action: Restore the file in question if possible. Otherwise restore the
>> entire pool from backup.
>> see: http://illumos.org/msg/ZFS-8000-8A
>> scan: none requested
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> userdata ONLINE 0 0 216K
>> mirror-0 ONLINE 0 0 432K
>> gpt/userdata0 ONLINE 0 0 432K
>> gpt/userdata1 ONLINE 0 0 432K
> That would be funny, if not that sad, but while writing this message,
> the pool started to look like below (I just asked zpool status twice in
> a row, comparing to what it was):
>
> [root at san1:~]# zpool status userdata
> pool: userdata
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://illumos.org/msg/ZFS-8000-8A
> scan: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> userdata ONLINE 0 0 728K
> mirror-0 ONLINE 0 0 1,42M
> gpt/userdata0 ONLINE 0 0 1,42M
> gpt/userdata1 ONLINE 0 0 1,42M
>
> errors: 4 data errors, use '-v' for a list
> [root at san1:~]# zpool status userdata
> pool: userdata
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://illumos.org/msg/ZFS-8000-8A
> scan: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> userdata ONLINE 0 0 730K
> mirror-0 ONLINE 0 0 1,43M
> gpt/userdata0 ONLINE 0 0 1,43M
> gpt/userdata1 ONLINE 0 0 1,43M
>
> errors: 4 data errors, use '-v' for a list
>
> So, you see, the error rate is like speed of light. And I'm not sure if
> the data access rate is that enormous, looks like they are increasing on
> their own.
> So may be someone have an idea on what this really means.
It is remarkable that You always have the same error count on both sides
of the mirror.
From what I have seen, such a picture appears when an unrecoverable
error (i.e. one that is on both sides of the mirror) is read again and
again.
File number 0x1 is probably some important metadata, and since it is not
readable it cannot be put into the ARC, so the read is tried ever again.
An error that would appear only on one side appears only once, because
then it is auto-corrected. In that case the figures have some erratic
deviations.
Therefore it is worthwile to remove the erroneous data soon, because as
long as that exists one does not get anything useful from the figures
(like how many errors are actually appearing anew).
More information about the freebsd-stable
mailing list