[Bug 266014] panic: corrupted zfs dataset (zfs issue)

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 25 Oct 2022 05:40:21 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266014

Duncan <dpy@pobox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|panic: on long running find |panic: corrupted zfs
                   |(zfs issue)                 |dataset (zfs issue)

--- Comment #5 from Duncan <dpy@pobox.com> ---
I got back to trying to move forward with this issue (re-enabling full EOD
runs) and found out where the problem was.

In my nextcloud jail, part of the /usr/src file system would cause a panic if
accessed (i.e. running a find over it).  I haven't gotten around to locating
the exact directory/file.

Now the interesting thing is that this dataset is encrypted and would mount
when decrypted (using a key from higher up the filesytem hierarchy (typed in
password as part of startup)).  The panic would only occur on access to parts
of the filesytem dataset.

I tried replicating the dataset (to keep for later diagnosis), but upon
mounting, machine would panic, requiring a boot into single user mode and
deleting the copied dataset (probably should just modify "canmount"), before
booting would complete without a panic.

My backups(?) consisted of dataset replication onto other pools (in the same
machine and to another (soon to be offsite machine (running truenas)).  When I
entered the key and mounting occurred, both other systems would panic.

My only solution (I could think of), was to create a new dataset and copy over
(using rsync in this case) all the folders except /usr/src.  I copied /usr/src
from another jail.

I have renamed and kept the original dataset for potential debugging in the
future.


Moral of the story:  Proof that ZFS replication is actually NOT the same as a
backup.  The corruption was propagated in a more virulent form (mount == panic)
to the replicated dataset.

At some time I would appreciate being able to help someone figure out what has
happened to the dataset, and how to stop similar in the future.  It has shaken
my faith a little (in ZFS).

-- 
You are receiving this mail because:
You are the assignee for the bug.