Re: ZFS replace a mirrored disk

From: Christos Chatzaras <chris_at_cretaforce.gr>
Date: Wed, 11 May 2022 22:18:03 UTC
> First please define "without success", what doesn't work?
> 
> please paste output of:
> 
> $> gpart show nvd1
> 
> also, is it an UEFI system or classicla BIOS with GPT? What FreeBSD
> version?
> 
> zpool replace zroot nvd0 is invalid, you should use:
> 
> $> zpool replace zroot nvd1 nvd0 (but it uses the entire disk, which is
> probably incorrect too)



It's legacy BIOS with GPT.

What I want to do is "simulate" a disk failure and rebuild the RAID-1.

First I run these commands from the main OS:

------------------------

$> gpart show
=>        40  7501476448  nvd0  GPT  (3.5T)
          40        1024     1  freebsd-boot  (512K)
        1064         984        - free -  (492K)
        2048    33554432     2  freebsd-swap  (16G)
    33556480  7467919360     3  freebsd-zfs  (3.5T)
  7501475840         648        - free -  (324K)

=>        40  7501476448  nvd1  GPT  (3.5T)
          40        1024     1  freebsd-boot  (512K)
        1064         984        - free -  (492K)
        2048    33554432     2  freebsd-swap  (16G)
    33556480  7467919360     3  freebsd-zfs  (3.5T)
  7501475840         648        - free -  (324K)


$> zpool status
  pool: zroot
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            nvd0p3  ONLINE       0     0     0
            nvd1p3  ONLINE       0     0     0

errors: No known data errors


------------------------------


Then I boot with mfsBSD and run this command to "simulate" a disk failure:

$> gpart destroy -F nvd0


------------------------------


Then I boot again in main OS and I run these commands:

$> zpool status
  pool: zroot
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            nvd0    UNAVAIL      0     0     0  invalid label
            nvd1p3  ONLINE       0     0     0

errors: No known data errors


$> gmirror status
       Name    Status  Components
mirror/swap  DEGRADED  nvd1p2 (ACTIVE)


------------------------------


Then I backup / restore the partitions:

$> gpart backup nvd1 | gpart restore -F nvd0

$> gpart show
=>        40  7501476448  nvd1  GPT  (3.5T)
          40        1024     1  freebsd-boot  (512K)
        1064         984        - free -  (492K)
        2048    33554432     2  freebsd-swap  (16G)
    33556480  7467919360     3  freebsd-zfs  (3.5T)
  7501475840         648        - free -  (324K)

=>        40  7501476448  nvd0  GPT  (3.5T)
          40        1024     1  freebsd-boot  (512K)
        1064         984        - free -  (492K)
        2048    33554432     2  freebsd-swap  (16G)
    33556480  7467919360     3  freebsd-zfs  (3.5T)
  7501475840         648        - free -  (324K)


------------------------------


Without doing a "gmirror forget swap" and "gmirror insert swap /dev/nvd0p2" I see that swap is already mirrored:

$> gmirror status
       Name    Status  Components
mirror/swap  COMPLETE  nvd1p2 (ACTIVE)
                       nvd0p2 (ACTIVE)

So first question is if the swap is mirrored automatically because nvd0 is the same disk (not replaced by a new disk).


-------------------------------


Then I write the bootloader:

$> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 nvd0 

--------------------------------


Then I want to add this disk to zpool but these commands don't do it:


$> zpool replace zroot nvd0
invalid vdev specification
use '-f' to override the following errors:
/dev/nvd0 is part of active pool 'zroot'


$> zpool replace -f zroot nvd0
invalid vdev specification
the following errors must be manually repaired:
/dev/nvd0 is part of active pool 'zroot'


-----------------------------------

Also these commands don't work:

$> zpool replace zroot nvd1 nvd0
invalid vdev specification
use '-f' to override the following errors:
/dev/nvd0 is part of active pool 'zroot'

$> zpool replace -f zroot nvd1 nvd0
invalid vdev specification
the following errors must be manually repaired:
/dev/nvd0 is part of active pool 'zroot'


-----------------------------------


Instead these commands work:

$> zpool offline zroot nvd0

zpool status
  pool: zroot
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            nvd0    OFFLINE      0     0     0
            nvd1p3  ONLINE       0     0     0

errors: No known data errors


$> zpool online zroot nvd0


$> zpool status
  pool: zroot
 state: ONLINE
  scan: resilvered 5.55M in 00:00:00 with 0 errors on Thu May 12 00:22:13 2022
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            nvd0p3  ONLINE       0     0     0
            nvd1p3  ONLINE       0     0     0

errors: No known data errors


------------------------------------


The second question is if instead of "zpool replace zroot nvd0" I had to use "zpool offline zroot nvd0" and "zpool online zroot nvd0" because nvd0 is the same disk (not replaced by a new disk).

Also I notice that if I don't do "zpool offline zroot nvd0" and "zpool online zroot nvd0" , but do a server reboot instead then zpool automatically puts nvd0 online:

$> zpool status
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: resilvered 3.50M in 00:00:00 with 0 errors on Thu May 12 01:04:09 2022
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            nvd0p3  ONLINE       0     0     2
            nvd1p3  ONLINE       0     0     0

errors: No known data errors


$> zpool clear zroot

$> zpool status
  pool: zroot
 state: ONLINE
  scan: resilvered 3.50M in 00:00:00 with 0 errors on Thu May 12 01:04:09 2022
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            nvd0p3  ONLINE       0     0     0
            nvd1p3  ONLINE       0     0     0

errors: No known data errors