Re: ZFS: Rescue FAULTED Pool

From: Allan Jude <allanjude_at_freebsd.org>
Date: Thu, 30 Jan 2025 21:13:56 UTC
On 1/30/2025 6:35 AM, A FreeBSD User wrote:
> Am Wed, 29 Jan 2025 03:45:25 -0800
> David Wolfskill <david@catwhisker.org> schrieb:
> 
> Hello, thanks for responding.
> 
>> On Wed, Jan 29, 2025 at 11:27:01AM +0100, FreeBSD User wrote:
>>> Hello,
>>>
>>> a ZFS pool (RAINDZ(1)) has been faulted. The pool is not importable
>>> anymore. neither with import -F/-f.
>>> Although this pool is on an experimental system (no backup available)
>>> it contains some data to reconstruct them would take a while, so I'd
>>> like to ask whether there is a way to try to "de-fault" such a pool.
>>
>> Well, 'zpool clear ...' "Clears device errors in a pool." (from "man
>> zpool".
>>
>> It is, however, not magic -- it doesn't actually fix anything.
> 
> For the record: I tried EVERY network/search  available method useful for common
> "administrators", but hoped people are abe to manipulate deeper stuff via zdb ...
> 
>>
>> (I had an issue with a zpool which had a single SSD device as a ZIL; the
>> ZIL device failed after it had accepted some data to be written to the
>> pool, but before the data could be read and transferred to the spinning
>> disks.  ZFS was quite unhappy about that.  I was eventually able to copy
>> the data elsewhere, destroy the old zpool, recreate it *without* that
>> single point of failure, then copy the data back.  And I learned to
>> never create a zpool with a *single* device as a separate ZIL.)
> 
> Well, in this case I do not use dedicated ZIL drives. I also made several experiences with
> "single" ZIL drive setups, but a dedicated ZIL is mostly useful in cases were you have
> graveyard full of inertia-suffering, mass-spinning HDDs - if I'm right the concept of SSD
> based ZIL would be of no use/effect in that case. So I ommited tose.
> 
>>
>>> The pool is comprised from 7 drives as a RAIDZ1, one of the SSDs
>>> faulted but I pulled the wrong one, so the pool ran into suspended
>>> state.
>>
>> Can you put the drive you pulled back in?
> 
> Every single SSD originally plugged in is now back in place, even the faulted one (which
> doesn't report any faults at the moment).
> 
> Although the pool isn't "importable", zdb reports its existence, amongst zroot (which resides
> on a dedicated drive).
> 
>>
>>> The host is running the lates Xigmanas BETA, which is effectively
>>> FreeBSD 14.1-p2, just for the record.
>>>
>>> I do not want to give up, since I hoped there might be a rude but
>>> effective way to restore the pool even under datalosses ...
>>>
>>> Thanks in advance,
>>>
>>> Oliver
>>> ....
>>
>> Good luck!
>>
>> Peace,
>> david
> 
> 
> Well, this is a hard and painful lecture to learn, if there is no chance to get back the pool.
> 
> A warning (but this seems to be useless in the realm of professionals): I used a bunch of
> cheap spotmarket SATA SSDs, a brand called "Intenso" common also here in Good old Germany.
> Some of those SSDs do have working LED when used with a Fujitsu SAS HBA controller - but those
> died very quickly from suffering some bus errors. Another bunch of those SSDs do not have
> working LED (not blinking on access), but lasted a bit longer. The problem with those SSDs is:
> I can not find the failing device easily by accessing the failed drive by writing massive data
> via dd, if possible.
> I also ordered alternative SSDs from a more expensive brand - but bad Karma ...
> 
> Oliver
> 
> 

The most useful thing to share right now would be the output of `zpool 
import` (with no pool name) on the rebooted system.

That will show where the issues are, and suggest how they might be solved.

-- 
Allan Jude