From nobody Mon Sep 09 15:51:05 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X2WXK6JKqz5Wjr8 for ; Mon, 09 Sep 2024 15:51:13 +0000 (UTC) (envelope-from andy@time-domain.co.uk) Received: from mail0.time-domain.net (mail0.time-domain.net [62.3.122.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4X2WXJ5PH0z4qDy for ; Mon, 9 Sep 2024 15:51:12 +0000 (UTC) (envelope-from andy@time-domain.co.uk) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of andy@time-domain.co.uk designates 62.3.122.138 as permitted sender) smtp.mailfrom=andy@time-domain.co.uk Received: from mail0.time-domain.net (localhost [127.0.0.1]) by mail0.time-domain.net (8.15.2/8.15.2) with ESMTP id 489Fp563069580 for ; Mon, 9 Sep 2024 16:51:05 +0100 (BST) (envelope-from andy@time-domain.co.uk) Received: from localhost (andy-tds@localhost) by mail0.time-domain.net (8.15.2/8.15.2/Submit) with ESMTP id 489Fp5O6069577 for ; Mon, 9 Sep 2024 16:51:05 +0100 (BST) (envelope-from andy@time-domain.co.uk) X-Authentication-Warning: mail0.time-domain.net: andy-tds owned process doing -bs Date: Mon, 9 Sep 2024 16:51:05 +0100 (BST) From: andy thomas X-X-Sender: andy-tds@mail0.time-domain.net To: freebsd-fs@freebsd.org Subject: Does a failed separate ZIL disk mean the entire zpool is lost? Message-ID: User-Agent: Alpine 2.22 (BSF 395 2020-01-19) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-Spamd-Bar: - X-Spamd-Result: default: False [-1.37 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+ip4:62.3.122.136/29]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_SHORT(-0.07)[-0.071]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_ONE(0.00)[1]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:13037, ipnet:62.3.64.0/18, country:GB]; FREEFALL_USER(0.00)[andy]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; HAS_XAW(0.00)[]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; DMARC_NA(0.00)[time-domain.co.uk]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; ARC_NA(0.00)[] X-Rspamd-Queue-Id: 4X2WXJ5PH0z4qDy A server I look after had a 65TB ZFS RAIDz1 pool with 8 x 8TB hard disks plus one hot spare and separate ZFS intent log (ZIL) and L2ARC cache disks that used a pair of 256GB SSDs. This ran really well for 6 years until 2 weeks ago, when the main cooling system in the data centre where it was installed failed and the backup cooling system failed to start up. The upshot was the ZIL SSD went short-circuit across its power connector, shorting out the server's PSUs and shutting down the server. After replacing the failed SSD and verifying all the spinning hard disks and the cache SSD are undamaged, attempts to import the pool fail with the following message: NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT clustor2 - - - - - - - - UNAVAIL - Does this mean the pool's contents are now lost and unrecoverable? Andy