Re: Unable to replace drive in raidz1

From: Alan Somers <asomers_at_freebsd.org>
Date: Fri, 06 Sep 2024 17:59:26 UTC
On Fri, Sep 6, 2024 at 11:50 AM Chris Ross <cross+freebsd@distal.com> wrote:
>
>
>
> > On Sep 6, 2024, at 13:02, Alan Somers <asomers@freebsd.org> wrote:
> >
> > This looks like you got into a split-brain situation where the disks
> > have inconsistent labels.  Most disks think that da10 is not a member
> > of the pool, but da10 thinks that it is.  Perhaps you added it as a
> > spare, then physically removed it, and then did a "zpool remove" to
> > remove the spare from the configuration?
>
> I did configure it as a spare, and remove it as a spare, but I
> haven’t moved any disks physically since the once when I
> switched it in.  And this problem started before I ever tried
> adding da10 into the pool as a spare.
>
> >  If you're very very very
> > sure that there is no data on da10 that you care about, you can do
> > "zpool labelclear -f /dev/da10”
>
>
> I am sure, and I didn’t even need the -f.  But, no change.
>
> % sudo zpool labelclear /dev/da10
> Password:
>
> % sudo zdb -l /dev/da10
> failed to unpack label 0
> failed to unpack label 1
> failed to unpack label 2
> failed to unpack label 3
>
> % sudo zpool replace tank da3 da10
> cannot replace da3 with da10: already in replacing/spare config; wait for completion or use 'zpool detach'
>
>
>   :-(
>
>      - Chris

If there is no label on da10, and "zpool status" doesn't show any
spares, then I don't know what the problem is.  It's possible that
/sbin/zpool is printing an incorrect error message; it's fairly
notorious for that.  You could try to debug it.  Other wild guesses
include:
* maybe da3 is the disk with the out-of-date label.  You could try
physically removing it before doing "zpool replace"
* Since exported pools can't have active spares, you could try
exporting the pool and then reimporting it.

-Alan