Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

From: Charlie Li <vishwin_at_freebsd.org>
Date: Wed, 12 Apr 2023 17:22:25 UTC
Charlie Li wrote:
> Cy Schubert wrote:
>> On April 12, 2023 8:51:09 AM PDT, Charlie Li <vishwin@freebsd.org> wrote:
>>> Cy Schubert wrote:
>>>> I have a "sandhbox" pool, called t, used for /usr/obj and ports 
>>>> wrkdirs, and other writes I can easily recreate on my laptop. Here 
>>>> are the results of my tests.
>>>>
>>>> Method:
>>>>
>>>> Initially I copied my /usr/obj from my two build machines (one 
>>>> amd64.amd64 and an i386.i386) to my "sandbox" zpool.
>>>>
>>>> Next, with block_cloning disabled I did cp -R of the /usr/obj test 
>>>> files. Then a diff -qr. They source and target directories were the 
>>>> same.
>>>>
>>>> Next, I cleaned up (rm -rf) the target directory to prepare for the
>>>> block_clone enabled test.
>>>>
>>>> Next, I did zpool checkpoint t. After this, zpool upgrade t. Pool t 
>>>> now has block_cloning enabled.
>>>>
>>>> I repeated the cp -R test from above followed by a diff -qr. Almost
>>>> every file was different. The pool was corrupted.
>>>>
>>>> I restored the pool by the following removing the corruption:
>>>>
>>>>
>>>> slippy# zpool export t
>>>> slippy# zpool import --rewind-to-checkpoint t
>>>> slippy#
>>>>
>>>> It is recommended that people avoid upgrading their zpools until the
>>>> problem is fixed.
>>>>
>>> As of af7624ed3145, I just did this with an md(4)-backed test pool, 
>>> though with the second `cp -R` landing in a separate dataset, created 
>>> and destroyed for each test. No corruption either way. However, my 
>>> poudriere builds still output/package corrupted files (particularly 
>>> those with null characters), probably after install(1) invocations 
>>> (not cp(1)).
>>>
>>
>> You need to copy from/to the same dataset to reproduce the problem. 
>> Copying from a source dataset to a different dataset will avoid 
>> block_cloning.
>>
> Got the corruption now.
> 
Clarify: no corruption without block_cloning, corruption with.

What is still a mystery to me is how corruption happens even without 
block_cloning in the poudriere scenario. cp(1)/install(1) always happen 
within the same dataset, as this test.

-- 
Charlie Li
…nope, still don't have an exit line.