Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
- Reply: Charlie Li : "Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75"
- Reply: Cy Schubert : "Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75"
- In reply to: Mateusz Guzik : "Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 09 Apr 2023 22:15:57 UTC
On 4/9/23, Mateusz Guzik <mjguzik@gmail.com> wrote: > On 4/9/23, FreeBSD User <freebsd@walstatt-de.de> wrote: >> Am Sun, 9 Apr 2023 13:23:05 -0400 >> Charlie Li <vishwin@freebsd.org> schrieb: >> >>> Mateusz Guzik wrote: >>> > On 4/9/23, Charlie Li wrote: >>> >> I've also started noticing random artefacts and malformed files >>> >> whilst >>> >> building packages with poudriere, causing all sorts of "exec format >>> >> error"s, missing .so files due to corruption, data file corruption >>> >> causing unintended failure modes, etc. All without block_cloning; >>> >> enabling such causes a panic of its own when starting multiple >>> >> builder >>> >> jails at once. >>> >> >>> > >>> > what's the panic? >>> > >>> manually typed out: >>> >>> panic: VERIFY(!zil_replaying(zilog, tx)) failed >>> >>> cpuid = 7 >>> time = 1681060472 >>> KDB: stack backtrace: >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>> 0xfffffe02a05b28a0 >>> vpanic() at vpanic+0x152/frame 0xfffffe02a05b28f0 >>> spl_panic() at spl_panic+0x3a/frame 0xfffffe02a05b2950 >>> zfs_log_clone_range() at zfs_log_clone_range+0x1db/frame >>> 0xfffffe02a05b29e0 >>> zfs_clone_range() at zfs_clone_range+0xae2/frame 0xfffffe02a05b2bc0 >>> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0xff/frame >>> 0xfffffe02a05b2c40 >>> vn_copy_file_range() at vn_copy_file_range+0x115/frame >>> 0xfffffe02a05b2ce0 >>> kern_copy_file_range() at kern_copy_file_range+0x34e/frame >>> 0xfffffe02a05b2db0 >>> sys_copy_file_range() at sys_copy_file_range+0x78/frame >>> 0xfffffe02a05b2e00 >>> amd64_syscall() at amd64_syscall+0x148/frame 0xfffffe02a05b2f30 >>> fast_syscall_common() at fast_syscall_common+0xf8/frame >>> 0xfffffe02a05b2f30 >>> --- syscall (569, FreeBSD ELF64, copy_file_range), rip = 0x908d2a, rsp = >>> 0x820c28e68, rbp = 0x820c292b0 --- >>> KDB: enter: panic >>> [ thread pid 1856 tid 102129 ] >>> Stopped at kdb_enter+0x32: movq $0,0x12760f3(%rip) >>> db> >>> >> >> I have the same issue (crash on access of several, but random datasets). >> >> It started with /usr/ports build failures when performing updates or >> rebuilding ports, >> poudriere host doesn't work anymore, as soon as started building ports, >> the >> hosts (several of >> them, same OS revision, new ZFS option enabled) crash. >> Also when building binaries for an pkg OS distribution. >> >> That host also reports a ZFS RAIDZ pool as corrupted, out of the blue! >> Some >> files from a >> poudriere build and /usr/ports build seem to have issues with some >> temporarily created files >> in work directory. >> >> On another host /usr/ports is residing on ZFS and it crashes also when >> building/updating ports >> (/usr/ports residing on ZFS) - but on the same host /home is also >> residing >> on ZFS, but even >> downloading large amounts of emails, the host seem to be stable. Have not >> found out yet what >> kind of file access triggers the crash. >> > > I reproduced the VERIFY(!zil_replaying(zilog, tx)) panic. As the > backtrace shows it triggers when using copy_file_range, I temporarily > patched the kernel to never do block cloning. So far the only package > which failed to build was sqlite and it was for a legitimate reason > (compiler errored out due to a problem in the code). > ... and got an illegitimate failure: strip: file format not recognized the port builds after retrying iow there is more breakage. i don't know if the merge can be easily reverted now, will have to see about that -- Mateusz Guzik <mjguzik gmail.com>