Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
- In reply to: Mateusz Guzik : "Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 09 Apr 2023 22:35:19 UTC
In message <CAGudoHFd3_Sc-6ZcrNhv_56BVuW5fwN4uUDNvJw44VpvxoOQvA@mail.gmail.c om> , Mateusz Guzik writes: > On 4/9/23, Mateusz Guzik <mjguzik@gmail.com> wrote: > > On 4/9/23, FreeBSD User <freebsd@walstatt-de.de> wrote: > >> Am Sun, 9 Apr 2023 13:23:05 -0400 > >> Charlie Li <vishwin@freebsd.org> schrieb: > >> > >>> Mateusz Guzik wrote: > >>> > On 4/9/23, Charlie Li wrote: > >>> >> I've also started noticing random artefacts and malformed files > >>> >> whilst > >>> >> building packages with poudriere, causing all sorts of "exec format > >>> >> error"s, missing .so files due to corruption, data file corruption > >>> >> causing unintended failure modes, etc. All without block_cloning; > >>> >> enabling such causes a panic of its own when starting multiple > >>> >> builder > >>> >> jails at once. > >>> >> > >>> > > >>> > what's the panic? > >>> > > >>> manually typed out: > >>> > >>> panic: VERIFY(!zil_replaying(zilog, tx)) failed > >>> > >>> cpuid = 7 > >>> time = 1681060472 > >>> KDB: stack backtrace: > >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > >>> 0xfffffe02a05b28a0 > >>> vpanic() at vpanic+0x152/frame 0xfffffe02a05b28f0 > >>> spl_panic() at spl_panic+0x3a/frame 0xfffffe02a05b2950 > >>> zfs_log_clone_range() at zfs_log_clone_range+0x1db/frame > >>> 0xfffffe02a05b29e0 > >>> zfs_clone_range() at zfs_clone_range+0xae2/frame 0xfffffe02a05b2bc0 > >>> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0xff/frame > >>> 0xfffffe02a05b2c40 > >>> vn_copy_file_range() at vn_copy_file_range+0x115/frame > >>> 0xfffffe02a05b2ce0 > >>> kern_copy_file_range() at kern_copy_file_range+0x34e/frame > >>> 0xfffffe02a05b2db0 > >>> sys_copy_file_range() at sys_copy_file_range+0x78/frame > >>> 0xfffffe02a05b2e00 > >>> amd64_syscall() at amd64_syscall+0x148/frame 0xfffffe02a05b2f30 > >>> fast_syscall_common() at fast_syscall_common+0xf8/frame > >>> 0xfffffe02a05b2f30 > >>> --- syscall (569, FreeBSD ELF64, copy_file_range), rip = 0x908d2a, rsp = > >>> 0x820c28e68, rbp = 0x820c292b0 --- > >>> KDB: enter: panic > >>> [ thread pid 1856 tid 102129 ] > >>> Stopped at kdb_enter+0x32: movq $0,0x12760f3(%rip) > >>> db> > >>> > >> > >> I have the same issue (crash on access of several, but random datasets). > >> > >> It started with /usr/ports build failures when performing updates or > >> rebuilding ports, > >> poudriere host doesn't work anymore, as soon as started building ports, > >> the > >> hosts (several of > >> them, same OS revision, new ZFS option enabled) crash. > >> Also when building binaries for an pkg OS distribution. > >> > >> That host also reports a ZFS RAIDZ pool as corrupted, out of the blue! > >> Some > >> files from a > >> poudriere build and /usr/ports build seem to have issues with some > >> temporarily created files > >> in work directory. > >> > >> On another host /usr/ports is residing on ZFS and it crashes also when > >> building/updating ports > >> (/usr/ports residing on ZFS) - but on the same host /home is also > >> residing > >> on ZFS, but even > >> downloading large amounts of emails, the host seem to be stable. Have not > >> found out yet what > >> kind of file access triggers the crash. > >> > > > > I reproduced the VERIFY(!zil_replaying(zilog, tx)) panic. As the > > backtrace shows it triggers when using copy_file_range, I temporarily > > patched the kernel to never do block cloning. So far the only package > > which failed to build was sqlite and it was for a legitimate reason > > (compiler errored out due to a problem in the code). > > > > ... and got an illegitimate failure: > strip: file format not recognized > > the port builds after retrying > > iow there is more breakage. > > i don't know if the merge can be easily reverted now, will have to see > about that git revert is the easy part. What about people who've done zpool upgrade, and following the revert have read-only zpools? Personally, I typically avoid enabling new zpool features for the first few weeks, even months, just in case. But not everyone does this. People who've done zpool upgrade already will need to back up their zpools and restore them following any upgrade to a FreeBSD with reverted zfs commits. And, considering the above, we may be long past the point of no return. For me, personally, it won't matter either way. For others? I don't know. Maybe simply disabling block_cloning regardless of the zpool setting might be a less disruptive solution. What is the ZFS project issue number? -- Cheers, Cy Schubert <Cy.Schubert@cschubert.com> FreeBSD UNIX: <cy@FreeBSD.org> Web: https://FreeBSD.org NTP: <cy@nwtime.org> Web: https://nwtime.org e^(i*pi)+1=0