ZFS w/failing drives - any equivalent of Solaris FMA?
Oliver Fromme
olli at lurza.secnetix.de
Fri Sep 12 15:44:30 UTC 2008
Karl Pielorz wrote:
> Recently, a ZFS pool on my FreeBSD box started showing lots of errors on
> one drive in a mirrored pair.
>
> The pool consists of around 14 drives (as 7 mirrored pairs), hung off of a
> couple of SuperMicro 8 port SATA controllers (1 drive of each pair is on
> each controller).
>
> One of the drives started picking up a lot of errors (by the end of things
> it was returning errors pretty much for any reads/writes issued) - and
> taking ages to complete the I/O's.
>
> However, ZFS kept trying to use the drive - e.g. as I attached another
> drive to the remaining 'good' drive in the mirrored pair, ZFS was still
> trying to read data off the failed drive (and remaining good one) in order
> to complete it's re-silver to the newly attached drive.
>
> Having posted on the Open Solaris ZFS list - it appears, under Solaris
> there's an 'FMA Engine' which communicates drive failures and the like to
> ZFS - advising ZFS when a drive should be marked as 'failed'.
>
> Is there anything similar to this on FreeBSD yet? - i.e. Does/can anything
> on the system tell ZFS "This drives experiencing failures" rather than ZFS
> just seeing lots of timed out I/O 'errors'? (as appears to be the case).
>
> In the end, the failing drive was timing out literally every I/O - I did
> recover the situation by detaching it from the pool (which hung the machine
> - probably caused by ZFS having to update the meta-data on all drives,
> including the failed one). A reboot bought the pool back, minus the
> 'failed' drive, so enough of the 'detach' must have completed.
Did you try "atacontrol detach" to remove the disk from
the bus? I haven't tried that with ZFS, but gmirror
automatically detects when a disk has gone away, and
doesn't try to do anything with it anymore. It certainly
should not hang the machine. After all, what's the
purpose of a RAID when you have to reboot upon drive
failure. ;-)
Best regards
Oliver
--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart
FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd
"C++ is over-complicated nonsense. And Bjorn Shoestrap's book
a danger to public health. I tried reading it once, I was in
recovery for months."
-- Cliff Sarginson
More information about the freebsd-hackers
mailing list