From nobody Tue Sep 17 11:16:20 2024
X-Original-To: questions@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X7K3c3dWZz595SK
	for <questions@mlmmj.nyi.freebsd.org>; Tue, 17 Sep 2024 11:16:28 +0000 (UTC)
	(envelope-from freebsd-doc@fjl.co.uk)
Received: from bs2.fjl.org.uk (bs2.fjl.org.uk [84.45.41.208])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "bs2.fjl.org.uk", Issuer "bs2.fjl.org.uk" (not verified))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4X7K3b3zV3z4FTY
	for <questions@freebsd.org>; Tue, 17 Sep 2024 11:16:27 +0000 (UTC)
	(envelope-from freebsd-doc@fjl.co.uk)
Authentication-Results: mx1.freebsd.org;
	dkim=none;
	dmarc=none;
	spf=pass (mx1.freebsd.org: domain of freebsd-doc@fjl.co.uk designates 84.45.41.208 as permitted sender) smtp.mailfrom=freebsd-doc@fjl.co.uk
Received: from roundcube.fjl.uk ([192.168.0.2])
	by bs2.fjl.org.uk (8.16.1/8.16.1) with ESMTP id 48HBGKoF018540
	for <questions@freebsd.org>; Tue, 17 Sep 2024 11:16:20 GMT
	(envelope-from freebsd-doc@fjl.co.uk)
List-Id: User questions <freebsd-questions.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-questions
List-Help: <mailto:questions+help@freebsd.org>
List-Post: <mailto:questions@freebsd.org>
List-Subscribe: <mailto:questions+subscribe@freebsd.org>
List-Unsubscribe: <mailto:questions+unsubscribe@freebsd.org>
X-BeenThere: freebsd-questions@freebsd.org
Sender: owner-freebsd-questions@FreeBSD.org
MIME-Version: 1.0
Date: Tue, 17 Sep 2024 12:16:20 +0100
From: Frank Leonhardt <freebsd-doc@fjl.co.uk>
To: questions <questions@freebsd.org>
Subject: Re: Zpool status -- why does a suboptimal pool show as "ONLINE"?
In-Reply-To: <312af967-e5bf-4e83-b48b-7c2841719373@app.fastmail.com>
References: <378D100E-FFE1-4DA7-9C52-219863A50A24@gushi.org>
 <312af967-e5bf-4e83-b48b-7c2841719373@app.fastmail.com>
Message-ID: <0290d22f5be2eb0b324254b663076924@fjl.co.uk>
X-Sender: freebsd-doc@fjl.co.uk
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 8bit
X-Spamd-Bar: -
X-Spamd-Result: default: False [-1.86 / 15.00];
	SUBJECT_ENDS_QUESTION(1.00)[];
	NEURAL_HAM_MEDIUM(-0.99)[-0.989];
	NEURAL_HAM_SHORT(-0.92)[-0.922];
	NEURAL_HAM_LONG(-0.75)[-0.745];
	R_SPF_ALLOW(-0.20)[+ip4:84.45.41.208];
	MIME_GOOD(-0.10)[text/plain];
	ONCE_RECEIVED(0.10)[];
	FROM_HAS_DN(0.00)[];
	RCPT_COUNT_ONE(0.00)[1];
	RCVD_COUNT_ONE(0.00)[1];
	ASN(0.00)[asn:25577, ipnet:84.45.0.0/17, country:GB];
	MISSING_XM_UA(0.00)[];
	MIME_TRACE(0.00)[0:+];
	RCVD_TLS_LAST(0.00)[];
	MID_RHS_MATCH_FROM(0.00)[];
	MLMMJ_DEST(0.00)[questions@freebsd.org];
	TO_MATCH_ENVRCPT_ALL(0.00)[];
	FROM_EQ_ENVFROM(0.00)[];
	R_DKIM_NA(0.00)[];
	PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org];
	DMARC_NA(0.00)[fjl.co.uk];
	TO_DN_ALL(0.00)[];
	ARC_NA(0.00)[]
X-Rspamd-Queue-Id: 4X7K3b3zV3z4FTY

On 2024-09-12 14:29, Dave Cottlehuber wrote:
> On Thu, 12 Sep 2024, at 13:05, Dan Mahoney (Ports) wrote:
>> Hey there all,
>> 
>> I have a nagios check that assumes that if I have a suboptimal zfs
>> zpool, that the word “DEGRADED” will appear in the output.  One disk 
>> of
>> a two-disk mirror seems to have faulted, but the pool still shows as
>> “ONLINE”.  I know I’ve seen the word “DEGRADED” in the past.  What’s
>> different?
>> 
>>   pool: zroot
>>  state: ONLINE
>> status: One or more devices are faulted in response to persistent 
>> errors.
>>         Sufficient replicas exist for the pool to continue functioning 
>> in a
>>         degraded state.
>> action: Replace the faulted device, or use 'zpool clear' to mark the 
>> device
>>         repaired.
>> config:
>> 
>>         NAME        STATE     READ WRITE CKSUM
>>         zroot       ONLINE       0     0     0
>>           mirror-0  ONLINE       0     0     0
>>             ada0p3  FAULTED      4   372     0  too many errors
>>             ada1p3  ONLINE       0     0     0
>> 
>> errors: No known data errors
>> 
>> 14.1, if it matters, the disks are two innolite SATADOM’s.
> 
> Hi Dan
> 
> I agree that I would expect the mirror-0 at least to report DEGRADED
> or similar. Hopefully one of the zfs people clarifies the logic here.
> 
> Practically, what I do is run:
> 
>     zpool status | grep -v 'with 0 errors' | sha256
> 
> and check that this hash remains the same over time. It's obviously
> different for each pool. Could that help for nagios?

I agree. A faulted drive always used to appear as "FAULTED" and and the 
vdev and pool should both have been tagged "DEGRADED" (cascading 
upwards).

A faulted drive isn't necessary taken offline, although "too many 
errors" suggests it should be.

If this isn't a bug I'd like to know the reason why.

Regards, Frank.