Re: Zpool status -- why does a suboptimal pool show as "ONLINE"?

From: Frank Leonhardt <freebsd-doc_at_fjl.co.uk>
Date: Tue, 17 Sep 2024 11:16:20 UTC
On 2024-09-12 14:29, Dave Cottlehuber wrote:
> On Thu, 12 Sep 2024, at 13:05, Dan Mahoney (Ports) wrote:
>> Hey there all,
>> 
>> I have a nagios check that assumes that if I have a suboptimal zfs
>> zpool, that the word “DEGRADED” will appear in the output.  One disk 
>> of
>> a two-disk mirror seems to have faulted, but the pool still shows as
>> “ONLINE”.  I know I’ve seen the word “DEGRADED” in the past.  What’s
>> different?
>> 
>>   pool: zroot
>>  state: ONLINE
>> status: One or more devices are faulted in response to persistent 
>> errors.
>>         Sufficient replicas exist for the pool to continue functioning 
>> in a
>>         degraded state.
>> action: Replace the faulted device, or use 'zpool clear' to mark the 
>> device
>>         repaired.
>> config:
>> 
>>         NAME        STATE     READ WRITE CKSUM
>>         zroot       ONLINE       0     0     0
>>           mirror-0  ONLINE       0     0     0
>>             ada0p3  FAULTED      4   372     0  too many errors
>>             ada1p3  ONLINE       0     0     0
>> 
>> errors: No known data errors
>> 
>> 14.1, if it matters, the disks are two innolite SATADOM’s.
> 
> Hi Dan
> 
> I agree that I would expect the mirror-0 at least to report DEGRADED
> or similar. Hopefully one of the zfs people clarifies the logic here.
> 
> Practically, what I do is run:
> 
>     zpool status | grep -v 'with 0 errors' | sha256
> 
> and check that this hash remains the same over time. It's obviously
> different for each pool. Could that help for nagios?

I agree. A faulted drive always used to appear as "FAULTED" and and the 
vdev and pool should both have been tagged "DEGRADED" (cascading 
upwards).

A faulted drive isn't necessary taken offline, although "too many 
errors" suggests it should be.

If this isn't a bug I'd like to know the reason why.

Regards, Frank.