Re: ZFS: Suspended Pool due to allegedly uncorrectable I/O error

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Tue, 20 Aug 2024 03:41:13 UTC
On 8/19/24 14:19, Pamela Ballantyne wrote:
> <snip>
> On Sunday Morning, 08/11, I upgraded the server from 12.4-RELEASE-p9 to
> 13.3-RELEASE-p5.
> The upgrade went smoothly; there was no problem, and the server worked
> flawlessly post-upgrade.
>
> On Thursday evening, 8/15, the server became unreachable. It would still
> respond to pings via
> the IP address, but that was it.  I used to be able to access the server
> via IPMI, but that ability disappeared
> several company mergers ago. The current NOC staff sent me a screenshot of
> the server output,
> which showed repeated messages saying:
> 
> "Solaris: WARNING: Pool 'zroot' has encountered an uncorrectable I/O
> failure and has been suspended."
> <snip>


I have a SOHO network and have been running a FreeBSD/ ZFS on a few 
machines for a few years, including a 24x7 file server.


AIUI FreeBSD 12-R and prior used IllumOS ZFS code and FreeBSD 13-R and 
newer use OpenZFS code.


AIUI, to do an in-place upgrade of FreeBSD 12-R with Illumos ZFS pools 
to FreeBSD 13-R with OpenZFS code, you needed to follow a specific 
procedure to pre-upgrade (?) the Illumos pools before upgrading FreeBSD. 
  (Especially for root-on-ZFS.)


OpenZFS had a data destruction bug last year (November 2023?), which 
resurfaced this year (February 2024?).  Those events caused me to delay 
upgrading FreeBSD/ ZFS.


A few weeks ago, I did a fresh install of 13-R with UFS onto a 
repurposed machine, added HDD's/ SSD's, built a fresh OpenZFS pool, and 
replicated the data from the old file server to the new file server. 
The new 13-R/OpenZFS file server has been up 24x7 since then.  I have 
since repurposed/ rebuilt a backup server, and then the removable 
single-drive backup disks/ pools.  Everything is now FreeBSD 13-R and 
OpenZFS.


I have noted some differences in how OpenZFS does incremental 
replication versus how IllumOS ZFS did, but am still learning.  I expect 
I will be reworking my ZFS-related scripts as I figure things out.


I understand that in-place upgrading a FOSS computer over many years is 
a source of pride for many people.  I tried that, and it did not work 
out for me.  Since then, I have invested myself in fresh installs, 
minimal sysadmin changes, thorough documentation, scripting, version 
control, backup, restore, and multiple layers of redundancy.  The 
efforts are far more predictable and the results are far more reliable.


David