drive failure during rebuild causes page fault

Doug White dwhite at gumbysoft.com
Mon Dec 13 10:28:53 PST 2004


On Sun, 12 Dec 2004, Joe Rhett wrote:

> On Sun, Dec 12, 2004 at 09:59:16PM -0800, Doug White wrote:
> > Thats a nice shotgun you have there.
>
> Yessir.  And that's what testing is designed to uncover.  The question is
> why this works, and how do we prevent it?

I'm sure Soren appreciates you donating your feet to the cause :)

Why it works: the system assumes the administrator is competent enough to
not yank a disk that is being rebuilt to.

> Is there a proper way to handle these sort of events?  If so, where is it
> documented?
>
> And fyi just pulling the drives causes the same failure so that means that
> RAID1 buys you nothing because your system will also crash.

This is why I don't trust ATA RAID for fault tolerance -- it'll save your
data, but the system will tank.  Since the disk state is maintained by
the OS and not abstracted by a separate processor, if a disk dies in a
particularly bad way the system may not be able to cope.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite at gumbysoft.com          |  www.FreeBSD.org


More information about the freebsd-stable mailing list