Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Mon, 10 Apr 2023 04:41:12 UTC
On Sun, 9 Apr 2023 23:25:44 -0400
Charlie Li <vishwin@freebsd.org> wrote:

> Charlie Li wrote:
> > Cy Schubert wrote:  
> >>> The file corruption was prior to enabling block_cloning but after 
> >>> this import.
> >>>  
> >>
> >> This regression was fixed  by mjg's commit. I'm not sure which (I'm 
> >> presently AFK).
> >>  
> > Both the block_cloning panic and file corruption are still happening as 
> > of 351e4592f64b, which is after any such commit here, unless referring 
> > to https://github.com/openzfs/zfs/pull/14723 .
> >   
> ...which was committed here prior to 351e4592f64b, so the issues still 
> persist.
> 

Hmm, interesting. I'm experiencing no such panics nor corruption since
the commit.

Reading a previous email of yours from today block_cloning is not
enabled. Is it possible that before the regression was fixed, while it
was wreaking havoc in your zpool, that your zpool became irreversibly
corrupted resulting in panics, even with the fixed code?

One way, probably the best way, to test would be to revert back to the
commit prior to the import. If you still experience panics and
corruption, your zpool is damaged.

At the moment we don't know if the code is still broken or if it has
been fixed but residual damage is still causing creeping rot and panics.

I don't know if zpool scrub can fix this -- reading one comment on
FreeBSD-current, zpool scrub fails to complete.

I'm not convinced, yet, that the problem code has not been fixed. We
don't know if the panics are a result of corruption as a result of the
regression.

Would it be best if we reverted the seven commits to main? I don't
know. I could argue it either way. My problems, on four machines, have
been fixed by the subsequent commits. We don't know if there are other
regressions or if the current problems are due to corruption caused
writes prior to patches addressing the regression. Maybe reverting the
seven commits and taking a watch for further fallout approach, whether
the panics and problems persist post revert. If the problems persist
post revert we know for sure the regression has caused some permanent
corruption. This is a radical option. IMO, I'm torn whether a revert
would be the best approach or not. It has its merits but
significant limitations too.

-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e^(i*pi)+1=0