Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
- Reply: Charlie Li : "Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75"
- Reply: Charlie Li : "Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75"
- In reply to: Charlie Li : "Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 10 Apr 2023 23:54:13 UTC
On Mon, 10 Apr 2023 01:58:00 -0400 Charlie Li <vishwin@freebsd.org> wrote: > Cy Schubert wrote: > > Hmm, interesting. I'm experiencing no such panics nor corruption since > > the commit. > > > > Reading a previous email of yours from today block_cloning is not > > enabled. Is it possible that before the regression was fixed, while it > > was wreaking havoc in your zpool, that your zpool became irreversibly > > corrupted resulting in panics, even with the fixed code? > > > This is probably now the case. > > One way, probably the best way, to test would be to revert back to the > > commit prior to the import. If you still experience panics and > > corruption, your zpool is damaged. > > > Fails to mount with error 45 on a boot environment only a few commits > before the import. > > At the moment we don't know if the code is still broken or if it has > > been fixed but residual damage is still causing creeping rot and panics. > > > > I don't know if zpool scrub can fix this -- reading one comment on > > FreeBSD-current, zpool scrub fails to complete. > > > It doesn't. All scrubs on my end complete fully with nothing to repair. > > I'm not convinced, yet, that the problem code has not been fixed. We > > don't know if the panics are a result of corruption as a result of the > > regression. > > > > Would it be best if we reverted the seven commits to main? I don't > > know. I could argue it either way. My problems, on four machines, have > > been fixed by the subsequent commits. We don't know if there are other > > regressions or if the current problems are due to corruption caused > > writes prior to patches addressing the regression. Maybe reverting the > > seven commits and taking a watch for further fallout approach, whether > > the panics and problems persist post revert. If the problems persist > > post revert we know for sure the regression has caused some permanent > > corruption. This is a radical option. IMO, I'm torn whether a revert > > would be the best approach or not. It has its merits but > > significant limitations too. > > > Going to try recreating the pool on current tip, making sure that > block_cloning is disabled. > You'll need to do this at pool creation time. I have a "sandhbox" pool, called t, used for /usr/obj and ports wrkdirs, and other writes I can easily recreate on my laptop. Here are the results of my tests. Method: Initially I copied my /usr/obj from my two build machines (one amd64.amd64 and an i386.i386) to my "sandbox" zpool. Next, with block_cloning disabled I did cp -R of the /usr/obj test files. Then a diff -qr. They source and target directories were the same. Next, I cleaned up (rm -rf) the target directory to prepare for the block_clone enabled test. Next, I did zpool checkpoint t. After this, zpool upgrade t. Pool t now has block_cloning enabled. I repeated the cp -R test from above followed by a diff -qr. Almost every file was different. The pool was corrupted. I restored the pool by the following removing the corruption: slippy# zpool export t slippy# zpool import --rewind-to-checkpoint t slippy# It is recommended that people avoid upgrading their zpools until the problem is fixed. -- Cheers, Cy Schubert <Cy.Schubert@cschubert.com> FreeBSD UNIX: <cy@FreeBSD.org> Web: https://FreeBSD.org NTP: <cy@nwtime.org> Web: https://nwtime.org e^(i*pi)+1=0