Re: Patches for GPT and geli recovery

From: Fabian Keil <freebsd-listen_at_fabiankeil.de>
Date: Mon, 20 Dec 2021 14:15:00 UTC
Jason Bacon <bacon4000@gmail.com> wrote on 2021-12-19 at 16:21:39:

> On 12/19/21 13:40, Lee Brown wrote:
> > 
> > 
> > On Sun, Dec 19, 2021 at 8:52 AM Fabian Keil 
> > <freebsd-listen@fabiankeil.de <mailto:freebsd-listen@fabiankeil.de>> wrote:
> > 
> >     [cut]
> >     BTW, I would also be interested to know if others have
> >     experienced similar data corruption and could figure
> >     out how it happened.
> > 
> > Sounds like bitrot.  Bits flip on disks all the time, it doesn't matter 
> > if they are spinning rust or SSD, it happens.  Sometimes they are 
> > detected and corrected, in which case you won't know.  Sometimes they 
> > are detected and uncorrectable, you'll see that error propagated into 
> > the driver.  And sometimes they are not detected at all and cause no 
> > errors that the OS can surmise.  The higher the density of bits, the 
> > higher the probability of corruption.  SMART is not reliably 
> > predictive.  How does it happen?  Cosmic rays and entropy.  I've had 
> > lighty written SSD's fail after a few months.
> > 
> > I don't use ZFS, but have GELI-Authentication under a GMIRROR, so 
> > whenever a bad checksum is read, it breaks the mirror, which gets 
> > attention (Iast I looked, there wasn't a simple userland hook for bad 
> > GELI reads, but there was for GMIRROR add/remove events).
 
> How old was the corrupted filesystem?

I just checked:

fk@t520 /var/log/fk/2021-12-20 $grep "zpool create" *zpool-history*
ssh-steffen-sudo-zpool-history--l-bpool-20211220T102957:2017-08-10.21:52:07 zpool create -f -o version=28 -O compression=lzjb bpool /dev/ada0p2 [user 0 (root) on kendra]
ssh-steffen-sudo-zpool-history--l-dpool-20211220T103420:2015-03-17.18:46:42 zpool create dpool /dev/gpt/dpool-ada0.eli [user 0 (root) on kendra]
ssh-steffen-sudo-zpool-history--l-rpool-ada1-20211220T103234:2017-04-11.12:33:47 zpool create -o version=28 -o failmode=continue -O compression=lzjb -O checksum=sha256 rpool mirror /dev/ada0p3.eli /dev/da1p3.eli [user 0 (root) on ElectroBSD-11.0-STABLE-amd64]
sudo-zpool-history--l-cloudia2-20211220T103856:2017-04-12.14:45:07 zpool create -O recordsize=1m -O checksum=sha512 cloudia2 /dev/label/cloudia2.eli [user 0 (root) on t520.local]

So it looks like the partially corrupted pool "dpool" on
partition five was created on 2015-03-17 while the
(former) root pool "rpool-ada1" which didn't show any signs
of corruption was created on 2017-04-11 which indicates
that I installed a new operating system with cloudiatr
and kept the data pool unmodified.

The boot pool "bpool" was created on 2017-08-10 but
it gets recreated with each ElectroBSD kernel update
anyway.

>                                        I habitually wipe my disks and do 
> a fresh install at least once every 2 years to avoid issues like this. 

Do you read back the complete data after fresh installs to confirm
that the rewritten data arrived on disk as expected?

I prefer ZFS scrubs to confirm that the data is still reachable.

It's not obvious to me that recreating the data is safer than
keeping the old data but verifying checksums.

> I have experienced unexplained, unrecoverable errors on old filesystems, 
> but fortunately nothing critical.

I too have experienced various unrecoverable errors on disks
but I never lost GPT partition data and geli meta data at the
same time while most of the data on disk remained valid and
without the disk reporting any problems.

While the pools "dpool" and "cloudia2" contained a couple of
corrupt blocks this could be completely unrelated to the
corruption of the partition table and the geli meta data.

> This to me serves as another reminder to maintain regular backups of 
> important files and consider everything else expendable.

Agreed.

The problem disk mostly contained DVD rips and while some of them
weren't available on other disks as well, they could be recreated
by simply ripping the DVDs again.

Of course it's conceivable that some of the source DVDs now contain
corruption as well (I own many older DVDs that contain corrupt blocks),
but I could probably buy them new or rent them if needed.

I use zogftw for backups and my important data is backed
up to multiple external pools and some of them are stored
off-site.

Fabian