ZFS v28 on -STABLE not using hot spare

Tue Jan 3 15:17:18 UTC 2012

Matt Burke schreef:
> Over the holidays one of the disks on a server has failed, but despite
> configuring a hot spare, ZFS hasn't used it for some reason. Can anyone
> shed some light on what I might have mis-configured to break the hot-spare
> functionality?
>
>
> [root at x ~]# uname -a
> FreeBSD x 8.2-STABLE FreeBSD 8.2-STABLE #4: Mon Dec  5 12:43:58 GMT 2011
>    root at x:/usr/obj/usr/src/sys/x  amd64
>
>
> [root at x ~]# more /usr/src/sys/amd64/conf/x
> include         GENERIC
> ident           x
>
> options         GEOM_STRIPE
> options         ROUTETABLES=4
>
>
> [root at x ~]# zpool status -v
>    pool: data
>   state: DEGRADED
> status: One or more devices are faulted in response to persistent errors.
> 	Sufficient replicas exist for the pool to continue functioning in a
> 	degraded state.
> action: Replace the faulted device, or use 'zpool clear' to mark the device
> 	repaired.
>   scan: none requested
> config:
>
> 	NAME         STATE     READ WRITE CKSUM
> 	data         DEGRADED     0     0     0
> 	  mirror-0   ONLINE       0     0     0
> 	    mfid0    ONLINE       0     0     0
> 	    mfid14   ONLINE       0     0     0
> 	  mirror-1   ONLINE       0     0     0
> 	    mfid1    ONLINE       0     0     0
> 	    mfid15   ONLINE       0     0     0
> 	  mirror-2   DEGRADED     0     0     0
> 	    mfid2    ONLINE       0     0     0
> 	    mfid16   FAULTED      0   931     0  too many errors
> 	  mirror-3   ONLINE       0     0     0
> 	    mfid3    ONLINE       0     0     0
> 	    mfid17   ONLINE       0     0     0
> 	  mirror-4   ONLINE       0     0     0
> 	    mfid4    ONLINE       0     0     0
> 	    mfid18   ONLINE       0     0     0
> 	  mirror-5   ONLINE       0     0     0
> 	    mfid5    ONLINE       0     0     0
> 	    mfid19   ONLINE       0     0     0
> 	  mirror-6   ONLINE       0     0     0
> 	    mfid6    ONLINE       0     0     0
> 	    mfid20   ONLINE       0     0     0
> 	  mirror-7   ONLINE       0     0     0
> 	    mfid7    ONLINE       0     0     0
> 	    mfid21   ONLINE       0     0     0
> 	  mirror-8   ONLINE       0     0     0
> 	    mfid8    ONLINE       0     0     0
> 	    mfid22   ONLINE       0     0     0
> 	  mirror-9   ONLINE       0     0     0
> 	    mfid9    ONLINE       0     0     0
> 	    mfid23   ONLINE       0     0     0
> 	  mirror-10  ONLINE       0     0     0
> 	    mfid10   ONLINE       0     0     0
> 	    mfid24   ONLINE       0     0     0
> 	logs
> 	  mirror-11  ONLINE       0     0     0
> 	    mfid13   ONLINE       0     0     0
> 	    mfid26   ONLINE       0     0     0
> 	cache
> 	  mfid12     ONLINE       0     0     0
> 	  mfid25     ONLINE       0     0     0
> 	spares
> 	  mfid11     AVAIL
>
> errors: No known data errors
>
> The logs show loads of mfi1 and mfid16 errors for a few minutes, and then
> (presumably when ZFS dropped the disk) nothing relevant after that. ZFS
> hasn't logged anything, not even that it's failed a disk.
>
> I've manually done a 'zpool replace data mfid16 mfid11' which has brought
> the spare in without problems, but I'm eager to learn what I did (or didn't
> do?) to cause the spare to not be used automatically.
>
> Thanks in advance,
>
>
ZFS on FreeBSD does not have 'HOT' spares.
They are cold, and human intervention is needed to replace a disk in a pool.
There are some topics about it on the net.

I opt for a warning, because a lot of users get a false security sence 
when using the spares.
zpool should not accept the spare without a warning to the user that it 
is a cold spare and not a hot one.

it looks like there is some work planned for a zfs deamon that should 
overcome this problem on FreeBSD

http://svnweb.freebsd.org/base?view=revision&revision=222836

On Solaris there is also a deamon running that does the actual replace.

It should not be to hard to make a script that checks every minute or 
what time interval you want and check if a pool is degraded, then check 
if autoreplace is set for the pool, if so check if there is a spare, if 
so do the actual replace.
Unfortunally i can not code :(
Maybe some one has a script lying around. ??

regards
Johan Hendriks