ZFS w/failing drives - any equivalent of Solaris FMA?
Jeremy Chadwick
koitsu at FreeBSD.org
Fri Sep 12 16:32:10 UTC 2008
On Fri, Sep 12, 2008 at 12:04:27PM -0400, Zaphod Beeblebrox wrote:
> On Fri, Sep 12, 2008 at 11:44 AM, Oliver Fromme <olli at lurza.secnetix.de>wrote:
> > Did you try "atacontrol detach" to remove the disk from
> > the bus? I haven't tried that with ZFS, but gmirror
> > automatically detects when a disk has gone away, and
> > doesn't try to do anything with it anymore. It certainly
> > should not hang the machine. After all, what's the
> > purpose of a RAID when you have to reboot upon drive
> > failure. ;-)
>
> To be fair, many "home" users run RAID without the expectation of being able
> to hot swap the drives. While RAID can provide high availability, but it
> can also provide simple data security.
RAID only ensures a very, very tiny part of "data security", and it
depends greatly on what RAID implementation you use. No RAID
implementation I know of provides against transparent data corruption
("bit-rot"), and many RAID controllers and RAID drivers have bugs that
induce corruption (to date, that's (very old ATA) Highpoint chips,
nVidia/nForce chips, JMicron or Silicon Image chips -- all of these are
used on consumer boards).
A big problem is also that end-users *still* think RAID is a replacement
for doing backups. :-(
> To your point... I suppose you have to reboot at some point after the drive
> failure, but my experience has been that the reboot has been under my
> control some time after the failure (usually when I have the replacement
> drive).
For home use, sure. Since most home/consumer systems do not include
hot-swappable drive bays, rebooting is required. Although more and more
consumer motherboards are offering AHCI -- which is the only reliable
way you'll get that capability with SATA.
In my case with servers in a co-lo, it's not acceptable. Our systems
contain SATA backplanes that support hot-swapping, and it works how it
should (yank the disk, replace with a new one) on Linux -- there is no
need to do a bunch of hoopla like on FreeBSD. On FreeBSD, with that
hoopla, also take the risk of inducing a kernel panic. That risk does
not sit well with me, but thankfully I've only been in that situation
(replacing a bad disk + using hot-swapping) once -- and it did work.
At my home, I have a pseudo-NAS system running FreeBSD. The case is
from Supermicro, a mid-tower, and has a SATA backplane that supports
hot-swapping. I use ZFS on this system, sporting 3 disks and one
(non-ZFS) for boot/OS. But because I'm using ata(4) -- see above.
Individuals on -stable and other lists using ZFS have posted their
experiences with disk failures. I believe to date I've seen one which
worked flawlessly, and the others reporting strange issues with
resilvering, or in a couple cases, lost all their zpools permanently.
Of course, it's very rare in this day and age for people to mail a
mailing list reporting *successes* with something -- people usually only
mail if something *fails*. :-)
That said, pjd@'s dedication to getting ZFS working reliably on FreeBSD
is outstanding. It's a great filesystem replacement, and even the Linux
folks are a bit jealous over how simple and painless it is. I can
share their jealousy -- I've looked at the LVM docs... never again.
> About the only real improvement I'd like to see in this setup is the ability
> to spin down idle drives. That would be an ideal setup for the home RAID
> array.
There is a FreeBSD port which handles this, although such a feature
should ideally be part of the ata(4) system (as should TCQ/NCQ and a
slew of other things -- some of those are being worked on).
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-hackers
mailing list