ZFS...

Walter Cramer wfc at mintsol.com
Wed May 8 16:32:01 UTC 2019


On Wed, 8 May 2019, Paul Mather wrote:

> On May 8, 2019, at 9:59 AM, Michelle Sullivan <michelle at sorbs.net> wrote:
>
>> Paul Mather wrote:
>>>> due to lack of space.  Interestingly have had another drive die in the 
>>>> array - and it doesn't just have one or two sectors down it has a *lot* - 
>>>> which was not noticed by the original machine - I moved the drive to a 
>>>> byte copier which is where it's reporting 100's of sectors damaged... 
>>>> could this be compounded by zfs/mfi driver/hba not picking up errors like 
>>>> it should?
>>> 
>>> 
>>> Did you have regular pool scrubs enabled?  It would have picked up silent 
>>> data corruption like this.  It does for me.
>> Yes, every month (once a month because, (1) the data doesn't change much 
>> (new data is added, old it not touched), and (2) because to complete it 
>> took 2 weeks.)
>
>
> Do you also run sysutils/smartmontools to monitor S.M.A.R.T. attributes? 
> Although imperfect, it can sometimes signal trouble brewing with a drive 
> (e.g., increasing Reallocated_Sector_Ct and Current_Pending_Sector counts) 
> that can lead to proactive remediation before catastrophe strikes.
>
> Unless you have been gathering periodic drive metrics, you have no way of 
> knowing whether these hundreds of bad sectors have happened suddenly or 
> slowly over a period of time.
>

+1

Use `smartctl` from a cron script to do regular (say, weekly) *long* 
self-tests of hard drives, and also log (say, daily) all the SMART 
information from each drive.  Then if a drive fails, you can at least 
check the logs for whether SMART noticed symptoms, and (if so) for other 
drives with symptoms.  Or enhance this with a slightly longer script, 
which watches the logs for symptoms, and alerts you.

(My experience is that SMART's *long* self-test checks the entire disk for 
read errors, without neither downside of `zpool scrub` - it does a fast, 
sequential read of the HD, including free space.  That makes it a nice 
test for failing disk hardware; not a replacement for `zpool scrub`.)

> Cheers,
>
> Paul.
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"


More information about the freebsd-stable mailing list