Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Sun, 09 Apr 2023 22:15:57 UTC
On 4/9/23, Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 4/9/23, FreeBSD User <freebsd@walstatt-de.de> wrote:
>> Am Sun, 9 Apr 2023 13:23:05 -0400
>> Charlie Li <vishwin@freebsd.org> schrieb:
>>
>>> Mateusz Guzik wrote:
>>> > On 4/9/23, Charlie Li wrote:
>>> >> I've also started noticing random artefacts and malformed files
>>> >> whilst
>>> >> building packages with poudriere, causing all sorts of "exec format
>>> >> error"s, missing .so files due to corruption, data file corruption
>>> >> causing unintended failure modes, etc. All without block_cloning;
>>> >> enabling such causes a panic of its own when starting multiple
>>> >> builder
>>> >> jails at once.
>>> >>
>>> >
>>> > what's the panic?
>>> >
>>> manually typed out:
>>>
>>> panic: VERIFY(!zil_replaying(zilog, tx)) failed
>>>
>>> cpuid = 7
>>> time = 1681060472
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfffffe02a05b28a0
>>> vpanic() at vpanic+0x152/frame 0xfffffe02a05b28f0
>>> spl_panic() at spl_panic+0x3a/frame 0xfffffe02a05b2950
>>> zfs_log_clone_range() at zfs_log_clone_range+0x1db/frame
>>> 0xfffffe02a05b29e0
>>> zfs_clone_range() at zfs_clone_range+0xae2/frame 0xfffffe02a05b2bc0
>>> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0xff/frame
>>> 0xfffffe02a05b2c40
>>> vn_copy_file_range() at vn_copy_file_range+0x115/frame
>>> 0xfffffe02a05b2ce0
>>> kern_copy_file_range() at kern_copy_file_range+0x34e/frame
>>> 0xfffffe02a05b2db0
>>> sys_copy_file_range() at sys_copy_file_range+0x78/frame
>>> 0xfffffe02a05b2e00
>>> amd64_syscall() at amd64_syscall+0x148/frame 0xfffffe02a05b2f30
>>> fast_syscall_common() at fast_syscall_common+0xf8/frame
>>> 0xfffffe02a05b2f30
>>> --- syscall (569, FreeBSD ELF64, copy_file_range), rip = 0x908d2a, rsp =
>>> 0x820c28e68, rbp = 0x820c292b0 ---
>>> KDB: enter: panic
>>> [ thread pid 1856 tid 102129 ]
>>> Stopped at      kdb_enter+0x32: movq    $0,0x12760f3(%rip)
>>> db>
>>>
>>
>> I have the same issue (crash on access of several, but random datasets).
>>
>> It started with /usr/ports build failures when performing updates or
>> rebuilding ports,
>> poudriere host doesn't work anymore, as soon as started building ports,
>> the
>> hosts (several of
>> them, same OS revision, new ZFS option enabled) crash.
>> Also when building binaries for an pkg OS distribution.
>>
>> That host also reports a ZFS RAIDZ pool as corrupted, out of the blue!
>> Some
>> files from a
>> poudriere build and /usr/ports build seem to have issues with some
>> temporarily created files
>> in work directory.
>>
>> On another host /usr/ports is residing on ZFS and it crashes also when
>> building/updating ports
>> (/usr/ports residing on ZFS) - but on the same host /home is also
>> residing
>> on ZFS, but even
>> downloading large amounts of emails, the host seem to be stable. Have not
>> found out yet what
>> kind of file access triggers the crash.
>>
>
> I reproduced the VERIFY(!zil_replaying(zilog, tx)) panic. As the
> backtrace shows it triggers when using copy_file_range, I temporarily
> patched the kernel to never do block cloning. So far the only package
> which failed to build was sqlite and it was for a legitimate reason
> (compiler errored out due to a problem in the code).
>

... and got an illegitimate failure:
strip: file format not recognized

the port builds after retrying

iow there is more breakage.

i don't know if the merge can be easily reverted now, will have to see
about that

-- 
Mateusz Guzik <mjguzik gmail.com>