From nobody Tue Aug 20 03:41:13 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WnwHb28zmz5V7y1 for ; Tue, 20 Aug 2024 03:41:31 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [IPv6:2001:470:0:19b::b869:801b]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "holgerdanske.com", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WnwHZ06Rtz4G2q for ; Tue, 20 Aug 2024 03:41:29 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=holgerdanske.com header.s=nov-20210719-112354 header.b=R6omItU4; dmarc=pass (policy=none) header.from=holgerdanske.com; spf=pass (mx1.freebsd.org: domain of dpchrist@holgerdanske.com designates 2001:470:0:19b::b869:801b as permitted sender) smtp.mailfrom=dpchrist@holgerdanske.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holgerdanske.com; s=nov-20210719-112354; t=1724125278; bh=c7b1wwpeeB5T37ybRs7FwR35b03B7YLq/J9ix7khw/c=; h=Received:Message-ID:Date:MIME-Version:User-Agent:Subject:To: References:Content-Language:From:In-Reply-To:Content-Type: Content-Transfer-Encoding; b=R6omItU4LIV5NJS7kLdRsC5u9YR18K42mc9GPluStoM35iq0cothHKpDTAyjAYJ25 4ZkyJ3FBR0qwuy/og6W7Ik16yRYJy0F8+NHZvB2js09GE3PNB6bZK4BD+2vKmnfUVi YsBAg3RVtMfyblf2vOM37P+V/0fWO4UaqKIyILSTMpoQ0HX9kX1B9sBYGlUlQh5qDD YiMhOst8AqmooQ2Pzf0W6evEC7SsCp5zZ4DReRfanr9EY7JAHl1YXyhTxIbTi7wKJn Rti9jkl1VPYSv34c0zUfHeEDidUNfEKpemMLgPTopHCcyvIrznmiAnR7QbbNOPYFH7 hw7W6DiONdMPMkQoS06ec95jFGJWKEWzNnP/LKV5byDpwkWS0599V3v2Dfi0cbossD fEpu/RzM2A4UdUIMwzP+INpvxHy4lz5bUk2WFaoaj5OwpjIgcVIgoV/Okt5Ddm0GxY s3Naa/daOAYgm7AYxq1QcgMNTQS2K1cQlcMM3Im3qoOdkIPGbnN9niFQhks4Yb/9tl leP8z6qd+Z2Chv3ltYy4II61+49v/iLemb/qqK63yLtb8vaCvSoTryW0D8nZIBwCdf oyfe78+7P6Qrsg+JaAhMWheJFNpc0Bu+uCT4Kt9oCxFpSWnZVEHltA9+bqkENqMTQz ULaM+axvnf3CTgRo6e3yVXoA= Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101]) by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Mon, 19 Aug 2024 20:41:18 -0700 Message-ID: <86142b7d-0f19-41c8-95ed-19a5d589ac86@holgerdanske.com> Date: Mon, 19 Aug 2024 20:41:13 -0700 List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: ZFS: Suspended Pool due to allegedly uncorrectable I/O error To: freebsd-fs@freebsd.org References: Content-Language: en-US From: David Christensen In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.87 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.98)[-0.976]; DMARC_POLICY_ALLOW(-0.50)[holgerdanske.com,none]; R_SPF_ALLOW(-0.20)[+a]; R_DKIM_ALLOW(-0.20)[holgerdanske.com:s=nov-20210719-112354]; ONCE_RECEIVED(0.10)[]; MIME_GOOD(-0.10)[text/plain]; XM_UA_NO_VERSION(0.01)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ONE(0.00)[1]; RCVD_TLS_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[holgerdanske.com:+] X-Rspamd-Queue-Id: 4WnwHZ06Rtz4G2q On 8/19/24 14:19, Pamela Ballantyne wrote: > > On Sunday Morning, 08/11, I upgraded the server from 12.4-RELEASE-p9 to > 13.3-RELEASE-p5. > The upgrade went smoothly; there was no problem, and the server worked > flawlessly post-upgrade. > > On Thursday evening, 8/15, the server became unreachable. It would still > respond to pings via > the IP address, but that was it. I used to be able to access the server > via IPMI, but that ability disappeared > several company mergers ago. The current NOC staff sent me a screenshot of > the server output, > which showed repeated messages saying: > > "Solaris: WARNING: Pool 'zroot' has encountered an uncorrectable I/O > failure and has been suspended." > I have a SOHO network and have been running a FreeBSD/ ZFS on a few machines for a few years, including a 24x7 file server. AIUI FreeBSD 12-R and prior used IllumOS ZFS code and FreeBSD 13-R and newer use OpenZFS code. AIUI, to do an in-place upgrade of FreeBSD 12-R with Illumos ZFS pools to FreeBSD 13-R with OpenZFS code, you needed to follow a specific procedure to pre-upgrade (?) the Illumos pools before upgrading FreeBSD. (Especially for root-on-ZFS.) OpenZFS had a data destruction bug last year (November 2023?), which resurfaced this year (February 2024?). Those events caused me to delay upgrading FreeBSD/ ZFS. A few weeks ago, I did a fresh install of 13-R with UFS onto a repurposed machine, added HDD's/ SSD's, built a fresh OpenZFS pool, and replicated the data from the old file server to the new file server. The new 13-R/OpenZFS file server has been up 24x7 since then. I have since repurposed/ rebuilt a backup server, and then the removable single-drive backup disks/ pools. Everything is now FreeBSD 13-R and OpenZFS. I have noted some differences in how OpenZFS does incremental replication versus how IllumOS ZFS did, but am still learning. I expect I will be reworking my ZFS-related scripts as I figure things out. I understand that in-place upgrading a FOSS computer over many years is a source of pride for many people. I tried that, and it did not work out for me. Since then, I have invested myself in fresh installs, minimal sysadmin changes, thorough documentation, scripting, version control, backup, restore, and multiple layers of redundancy. The efforts are far more predictable and the results are far more reliable. David