Re: Unable to replace drive in raidz1

From: Alan Somers <asomers_at_freebsd.org>
Date: Fri, 06 Sep 2024 15:32:35 UTC
On Fri, Sep 6, 2024 at 8:30 AM Chris Ross <cross+freebsd@distal.com> wrote:
>
> Oh, sorry.  Failed to indicate versions.  I was running 13.2 on amd64, and
> while facing this problem have updated to 14.1.  The current state is
> 14.1, and I’m still seeing the same, but I started in 13.2 and saw this.
>
> > On Sep 6, 2024, at 10:24, Chris Ross <cross+freebsd@distal.com> wrote:
> >
> > Hello.  I have searched the interwebs a bit and seen tell of this and others
> > like it, but I haven’t found a solution.
> >
> > I have a pool with three 3-disk raidz1 vols.  I want to replace the disks
> > in the first vdev with larger disks.  I’ve done this before, but may’ve
> > done something wrong here.
> >
> > I belive I used “zpool remove tank da3”, but command history doesn’t
> > have that.  I’ve used many commands since I started.  I might’ve
> > “zpool offline”d the device.  I’m sorry I don’t remember the original
> > command.
> >
> > Then replaced the disk and rebooted.  This of course renumbered the
> > disks.  :-(. But, finding the new/replacement disk (da10), I try to
> > “zpool replace tank da3 da10”.
> > This always produces:
> >
> > cannot replace da3 with da10: already in replacing/spare config; wait for completion or use 'zpool detach’
> >
> > Now, I can’t use “zpool detach” I’ve learned because that doesn’t work
> > on zraid.  And I can’t tell what it _thinks_ is happening.  I even
> > Did a scrub of the pool and let that finish, but am still seeing the
> > same.
> >
> > I have now:
> >
> > —8<—8<—8<---
> >>>>>>> zpool status -v tank
> >  pool: tank
> > state: DEGRADED
> > status: One or more devices are faulted in response to persistent errors.
> >       Sufficient replicas exist for the pool to continue functioning in a
> >       degraded state.
> > action: Replace the faulted device, or use 'zpool clear' to mark the device
> >       repaired.
> >  scan: scrub repaired 0B in 17:14:03 with 0 errors on Fri Sep  6 09:08:34 2024
> > config:
> >
> >       NAME                      STATE     READ WRITE CKSUM
> >       tank                      DEGRADED     0     0     0
> >         raidz1-0                DEGRADED     0     0     0
> >           da3                   FAULTED      0     0     0  external device fault
> >           da1                   ONLINE       0     0     0
> >           da2                   ONLINE       0     0     0
> >         raidz1-1                ONLINE       0     0     0
> >           diskid/DISK-K1GMBN9D  ONLINE       0     0     0
> >           diskid/DISK-K1GMEDMD  ONLINE       0     0     0
> >           diskid/DISK-K1GMAX1D  ONLINE       0     0     0
> >         raidz1-2                ONLINE       0     0     0
> >           diskid/DISK-3WJDHJ2J  ONLINE       0     0     0
> >           diskid/DISK-3WK3G1KJ  ONLINE       0     0     0
> >           diskid/DISK-3WJ7ZMMJ  ONLINE       0     0     0
> >
> > errors: No known data errors
> > —8<—8<—8<—
> >
> > I’ll note that before the switcharoo, the second and third vdev’s listed
> > “da4 da5 da6” and “da7 da8 da9”.  The moving around of names caused the
> > above, which again I’ve seen before, and am fine with.
> >
> > (Oh, you can see I ran a “zpool offline -f” on it most recently.  But
> > that was in the list of things I’ve tried that haven’t helped.)
> >
> > Please let me know if someone knows how I’ve gotten to this state, and
> > what I need to do to correct it.  What is the "in replacing/spare config”
> > meaning?
> >
> >                      - Chris

"zpool replace" is indeed the correct command.  There's no need to run
"zpool offline" first, and "zpool remove" is wrong.  Since "zpool
replace" is still failing, are you sure that da10 is still the correct
device name after all disks got renumbered?  If you're sure, then you
might run "zdb -l /dev/da10" to see what ZFS thinks is on that disk.

-Alan