Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Wed, 12 Apr 2023 17:37:59 UTC
On April 12, 2023 10:22:25 AM PDT, Charlie Li <vishwin@freebsd.org> wrote:
>Charlie Li wrote:
>> Cy Schubert wrote:
>>> On April 12, 2023 8:51:09 AM PDT, Charlie Li <vishwin@freebsd.org> wrote:
>>>> Cy Schubert wrote:
>>>>> I have a "sandhbox" pool, called t, used for /usr/obj and ports wrkdirs, and other writes I can easily recreate on my laptop. Here are the results of my tests.
>>>>> 
>>>>> Method:
>>>>> 
>>>>> Initially I copied my /usr/obj from my two build machines (one amd64.amd64 and an i386.i386) to my "sandbox" zpool.
>>>>> 
>>>>> Next, with block_cloning disabled I did cp -R of the /usr/obj test files. Then a diff -qr. They source and target directories were the same.
>>>>> 
>>>>> Next, I cleaned up (rm -rf) the target directory to prepare for the
>>>>> block_clone enabled test.
>>>>> 
>>>>> Next, I did zpool checkpoint t. After this, zpool upgrade t. Pool t now has block_cloning enabled.
>>>>> 
>>>>> I repeated the cp -R test from above followed by a diff -qr. Almost
>>>>> every file was different. The pool was corrupted.
>>>>> 
>>>>> I restored the pool by the following removing the corruption:
>>>>> 
>>>>> 
>>>>> slippy# zpool export t
>>>>> slippy# zpool import --rewind-to-checkpoint t
>>>>> slippy#
>>>>> 
>>>>> It is recommended that people avoid upgrading their zpools until the
>>>>> problem is fixed.
>>>>> 
>>>> As of af7624ed3145, I just did this with an md(4)-backed test pool, though with the second `cp -R` landing in a separate dataset, created and destroyed for each test. No corruption either way. However, my poudriere builds still output/package corrupted files (particularly those with null characters), probably after install(1) invocations (not cp(1)).
>>>> 
>>> 
>>> You need to copy from/to the same dataset to reproduce the problem. Copying from a source dataset to a different dataset will avoid block_cloning.
>>> 
>> Got the corruption now.
>> 
>Clarify: no corruption without block_cloning, corruption with.
>
>What is still a mystery to me is how corruption happens even without block_cloning in the poudriere scenario. cp(1)/install(1) always happen within the same dataset, as this test.
>

This is because your pool has previously corrupted blocks. Even when you backed up the old pool, created a new pool without block_cloning and restored your data, because the backup contained corrupted blocks from your old pool, they were restored as is. ZFS can only fix corruption if the checksum says it's corrupt. As far as ZFS was concerned at the time those blocks were not corrupted. You will need to delete the files with corruption and recreate them.

Even after this regression is fixed and people build/install kernel, whatever was corrupted will remain until corrupted files are either removed and recreated or fixed manually.

This regression will have long lasting effects.

Like Kirk McKusick has reiterated many times, back in the old days people didn't trust EXT*FS because of the data corruption experienced. Sadly ZFS will need to earn people's trust back again. This is unfortunate.


-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>  Web:  https://FreeBSD.org
NTP:                     <cy@nwtime.org>    Web:  https://nwtime.org
                                                    e^(i*pi)+1=0

Pardon the typos. Small keyboard in use.