ZFS raidz recovery
Jeremy Chadwick
freebsd at jdc.parodius.com
Sat Nov 27 15:30:25 UTC 2010
On Sat, Nov 27, 2010 at 03:22:49PM +0200, Gareth de Vaux wrote:
> Hi all, I'm trying to simulate a disk fail and replacement in
> a raidz array and failing myself. What'm I doing wrong? Here's
> a transcript with interspersed commentary:
>
> root at file:~# zpool status
> pool: raid
> state: ONLINE
> scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:20:06 2010
> config:
>
> NAME STATE READ WRITE CKSUM
> raid ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad12 ONLINE 0 0 0
> ad13 ONLINE 0 0 0
> ad4 ONLINE 0 0 0
> ad6 ONLINE 0 0 0
>
> errors: No known data errors
> root at file:~# zpool offline raid ad12
>
> reboot
> dd if=/dev/zero of=/dev/ad12 ..
>
> root at file:~# zpool replace raid ad12
> cannot replace ad12 with ad12: ad12 is busy
> root at file:~# zpool replace -f raid ad12
> cannot replace ad12 with ad12: ad12 is busy
>
> The handbook suggests 'replace' but I guess this is only
> if the disk is physically replaced and gets a new identifier?
> Trying with 'online':
>
> root at file:~# zpool online raid ad12
> root at file:~# zpool status
> pool: raid
> state: ONLINE
> scrub: resilver completed after 0h0m with 0 errors on Sat Nov 27 13:29:14 2010
> config:
>
> NAME STATE READ WRITE CKSUM
> raid ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad12 ONLINE 0 0 0 15.5K resilvered
> ad13 ONLINE 0 0 0
> ad4 ONLINE 0 0 0
> ad6 ONLINE 0 0 0
>
> errors: No known data errors
>
> Output remains as such, is this normal?
>
> root at file:~# zpool scrub raid
> root at file:~# zpool status
> pool: raid
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
> see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:30:37 2010
> config:
>
> NAME STATE READ WRITE CKSUM
> raid ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad12 ONLINE 0 0 2.11K 87.7M repaired
> ad13 ONLINE 0 0 0
> ad4 ONLINE 0 0 0
> ad6 ONLINE 0 0 0
>
> errors: No known data errors
> root at file:~# zpool scrub raid
> root at file:~# zpool status
> pool: raid
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
> see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:30:55 2010
> config:
>
> NAME STATE READ WRITE CKSUM
> raid ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad12 ONLINE 0 0 2.11K
> ad13 ONLINE 0 0 0
> ad4 ONLINE 0 0 0
> ad6 ONLINE 0 0 0
>
> errors: No known data errors
>
> These are checksum errors? So the disk hasn't been integrated
> properly?
>
> root at file:~# zpool clear raid ad12
> root at file:~# zpool status
> pool: raid
> state: ONLINE
> scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:39:09 2010
> config:
>
> NAME STATE READ WRITE CKSUM
> raid ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad12 ONLINE 0 0 0
> ad13 ONLINE 0 0 0
> ad4 ONLINE 0 0 0
> ad6 ONLINE 0 0 0
>
> errors: No known data errors
> root at file:~# zpool status -x
> all pools are healthy
>
> To make sure this's the case I fail a different disk:
>
> root at file:~# zpool offline raid ad6
> root at file:~# zpool status
> pool: raid
> state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
> Sufficient replicas exist for the pool to continue functioning in a
> degraded state.
> action: Online the device using 'zpool online' or replace the device with
> 'zpool replace'.
> scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:40:52 2010
> config:
>
> NAME STATE READ WRITE CKSUM
> raid DEGRADED 0 0 0
> raidz1 DEGRADED 0 0 0
> ad12 ONLINE 0 0 0
> ad13 ONLINE 0 0 0
> ad4 ONLINE 0 0 0
> ad6 OFFLINE 0 0 0
>
> errors: No known data errors
>
> on reboot the status changes:
>
> root at file:~# zpool status
> pool: raid
> state: FAULTED
> status: The pool metadata is corrupted and the pool cannot be opened.
> action: Destroy and re-create the pool from a backup source.
> see: http://www.sun.com/msg/ZFS-8000-72
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> raid FAULTED 0 0 1 corrupted data
> raidz1 DEGRADED 0 0 6
> ad12 OFFLINE 0 0 0
> ad13 ONLINE 0 0 0
> ad4 ONLINE 0 0 0
> ad6 ONLINE 0 0 1
>
>
> The same happens if I recreate the array and try again.
uname -a please -- it matters greatly.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable
mailing list