From nobody Sat Feb 01 08:57:15 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YlRVN6bbLz5mZbh for ; Sat, 01 Feb 2025 08:57:48 +0000 (UTC) (envelope-from freebsd@walstatt-de.de) Received: from smtp052.goneo.de (smtp052.goneo.de [85.220.129.60]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4YlRVM3H4cz3xrR; Sat, 01 Feb 2025 08:57:47 +0000 (UTC) (envelope-from freebsd@walstatt-de.de) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=walstatt-de.de header.s=DKIM001 header.b=F6kWw7TW; spf=pass (mx1.freebsd.org: domain of freebsd@walstatt-de.de designates 85.220.129.60 as permitted sender) smtp.mailfrom=freebsd@walstatt-de.de; dmarc=none Received: from hub1.goneo.de (hub1.goneo.de [IPv6:2001:1640:5::8:52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by smtp5.goneo.de (Postfix) with ESMTPS id AEEFF240E50; Sat, 1 Feb 2025 09:57:45 +0100 (CET) Received: from hub1.goneo.de (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by hub1.goneo.de (Postfix) with ESMTPS id D1B2B2407EE; Sat, 1 Feb 2025 09:57:43 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=walstatt-de.de; s=DKIM001; t=1738400263; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pjacg2clhcOtY3EtODr2rce+drQ/3YsZAGIFdl3szew=; b=F6kWw7TWWrmRZaeFHQuRh+frWov6HDUCGRQJr4GTi0I/PuZXIRyMpt/gNMR2D2eJWuIGxz iA1JJlbL181Iw4N4+As/6SrKFTS692R6gNT/T3vUTaZbQri6v7tT8KdswHCx7yOjwi3zPl kkLO/n1Xtk30q+9qXMmABfcBd84jGXk89dRoo+fzCK7mZb4ACucxvnfJGjXKuC4pYXYkXb uOASCzmQ3ZpR89d/nvhqq5cL6DI86ih+CrtMyaumLD0fgd1P0CJCNM6gYWDhUmpetTe12t lycK0cqc/krNMG20HXUZIqtcZ00KaFdZ+CLBle613kJr9oEEN8aNOUmknSwC+A== Received: from thor.sb211.local (dynamic-2a02-3100-1886-af02-e4c1-c283-6426-266f.310.pool.telefonica.de [IPv6:2a02:3100:1886:af02:e4c1:c283:6426:266f]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by hub1.goneo.de (Postfix) with ESMTPSA id 7EBF4240638; Sat, 1 Feb 2025 09:57:43 +0100 (CET) Date: Sat, 1 Feb 2025 09:57:15 +0100 From: A FreeBSD User To: Allan Jude Cc: freebsd-current@freebsd.org Subject: Re: ZFS: Rescue FAULTED Pool Message-ID: <20250201095656.1bdfbe5f@thor.sb211.local> In-Reply-To: <980401eb-f8f6-44c7-8ee1-5ff0c9e1c35c@freebsd.org> References: <20250129112701.0c4a3236@freyja> <20250130123354.2d767c7c@thor.sb211.local> <980401eb-f8f6-44c7-8ee1-5ff0c9e1c35c@freebsd.org> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: multipart/signed; boundary="Sig_/pfQSL+v5QfLrNu5ora.cKlu"; protocol="application/pgp-signature"; micalg=pgp-sha512 X-Rspamd-UID: 22558d X-Rspamd-UID: 476cf5 X-Spamd-Result: default: False [-6.70 / 15.00]; SIGNED_PGP(-2.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; RBL_SENDERSCORE_REPUT_9(-1.00)[85.220.129.60:from]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+ip4:85.220.129.0/25]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; R_DKIM_ALLOW(-0.20)[walstatt-de.de:s=DKIM001]; RCVD_IN_DNSWL_LOW(-0.10)[85.220.129.60:from]; ASN(0.00)[asn:25394, ipnet:85.220.128.0/17, country:DE]; MIME_TRACE(0.00)[0:+,1:+,2:~]; TO_DN_SOME(0.00)[]; MISSING_XM_UA(0.00)[]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; DMARC_NA(0.00)[walstatt-de.de]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[walstatt-de.de:+] X-Spamd-Bar: ------ X-Rspamd-Queue-Id: 4YlRVM3H4cz3xrR --Sig_/pfQSL+v5QfLrNu5ora.cKlu Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Am Thu, 30 Jan 2025 16:13:56 -0500 Allan Jude schrieb: > On 1/30/2025 6:35 AM, A FreeBSD User wrote: > > Am Wed, 29 Jan 2025 03:45:25 -0800 > > David Wolfskill schrieb: > >=20 > > Hello, thanks for responding. > > =20 > >> On Wed, Jan 29, 2025 at 11:27:01AM +0100, FreeBSD User wrote: =20 > >>> Hello, > >>> > >>> a ZFS pool (RAINDZ(1)) has been faulted. The pool is not importable > >>> anymore. neither with import -F/-f. > >>> Although this pool is on an experimental system (no backup available) > >>> it contains some data to reconstruct them would take a while, so I'd > >>> like to ask whether there is a way to try to "de-fault" such a pool. = =20 > >> > >> Well, 'zpool clear ...' "Clears device errors in a pool." (from "man > >> zpool". > >> > >> It is, however, not magic -- it doesn't actually fix anything. =20 > >=20 > > For the record: I tried EVERY network/search available method useful f= or common > > "administrators", but hoped people are abe to manipulate deeper stuff v= ia zdb ... > > =20 > >> > >> (I had an issue with a zpool which had a single SSD device as a ZIL; t= he > >> ZIL device failed after it had accepted some data to be written to the > >> pool, but before the data could be read and transferred to the spinning > >> disks. ZFS was quite unhappy about that. I was eventually able to co= py > >> the data elsewhere, destroy the old zpool, recreate it *without* that > >> single point of failure, then copy the data back. And I learned to > >> never create a zpool with a *single* device as a separate ZIL.) =20 > >=20 > > Well, in this case I do not use dedicated ZIL drives. I also made sever= al experiences with > > "single" ZIL drive setups, but a dedicated ZIL is mostly useful in case= s were you have > > graveyard full of inertia-suffering, mass-spinning HDDs - if I'm right = the concept of SSD > > based ZIL would be of no use/effect in that case. So I ommited tose. > > =20 > >> =20 > >>> The pool is comprised from 7 drives as a RAIDZ1, one of the SSDs > >>> faulted but I pulled the wrong one, so the pool ran into suspended > >>> state. =20 > >> > >> Can you put the drive you pulled back in? =20 > >=20 > > Every single SSD originally plugged in is now back in place, even the f= aulted one (which > > doesn't report any faults at the moment). > >=20 > > Although the pool isn't "importable", zdb reports its existence, amongs= t zroot (which > > resides on a dedicated drive). > > =20 > >> =20 > >>> The host is running the lates Xigmanas BETA, which is effectively > >>> FreeBSD 14.1-p2, just for the record. > >>> > >>> I do not want to give up, since I hoped there might be a rude but > >>> effective way to restore the pool even under datalosses ... > >>> > >>> Thanks in advance, > >>> > >>> Oliver > >>> .... =20 > >> > >> Good luck! > >> > >> Peace, > >> david =20 > >=20 > >=20 > > Well, this is a hard and painful lecture to learn, if there is no chanc= e to get back the > > pool. > >=20 > > A warning (but this seems to be useless in the realm of professionals):= I used a bunch of > > cheap spotmarket SATA SSDs, a brand called "Intenso" common also here i= n Good old Germany. > > Some of those SSDs do have working LED when used with a Fujitsu SAS HBA= controller - but > > those died very quickly from suffering some bus errors. Another bunch o= f those SSDs do not > > have working LED (not blinking on access), but lasted a bit longer. The= problem with those > > SSDs is: I can not find the failing device easily by accessing the fail= ed drive by writing > > massive data via dd, if possible. > > I also ordered alternative SSDs from a more expensive brand - but bad K= arma ... > >=20 > > Oliver > >=20 > > =20 >=20 > The most useful thing to share right now would be the output of `zpool=20 > import` (with no pool name) on the rebooted system. >=20 > That will show where the issues are, and suggest how they might be solved. >=20 Hello, this exactly happens when trying to import the pool. Prior to the lo= ss, device da1p1 has been faulted with numbers in the colum/columns "corrupted data"/further= not seen now. ~# zpool import pool: BUNKER00 id: XXXXXXXXXXXXXXXXXXXX state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72 config: BUNKER00 FAULTED corrupted data raidz1-0 ONLINE da2p1 ONLINE da3p1 ONLINE da4p1 ONLINE da7p1 ONLINE da6p1 ONLINE da1p1 ONLINE da5p1 ONLINE ~# zpool import -f BUNKER00 cannot import 'BUNKER00': I/O error Destroy and re-create the pool from a backup source. ~# zpool import -F BUNKER00 cannot import 'BUNKER00': one or more devices is currently unavailable --=20 A FreeBSD user --Sig_/pfQSL+v5QfLrNu5ora.cKlu Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iHUEARYKAB0WIQRQheDybVktG5eW/1Kxzvs8OqokrwUCZ53iBgAKCRCxzvs8Oqok r6sjAP4/5+HPz0EImGkLPE9Z2D5y5YNwlAJIg9c/DZrUF4UJCAD/QXeV+iCOl8f/ /23e1CzJFzdY8+/ARqKmeno7BAa0/gU= =konu -----END PGP SIGNATURE----- --Sig_/pfQSL+v5QfLrNu5ora.cKlu--