Re: another crash and going forward with zfs
- Reply: Mateusz Guzik : "Re: another crash and going forward with zfs"
- In reply to: Mateusz Guzik : "another crash and going forward with zfs"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 17 Apr 2023 19:41:26 UTC
On 4/18/23 03:51, Mateusz Guzik wrote: > After bugfixes got committed I decided to zpool upgrade and sysctl > vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very > quickly got a new crash: > > panic: VERIFY(arc_released(db->db_buf)) failed > > cpuid = 9 > time = 1681755046 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0a90b8e5f0 > vpanic() at vpanic+0x152/frame 0xfffffe0a90b8e640 > spl_panic() at spl_panic+0x3a/frame 0xfffffe0a90b8e6a0 > dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfffffe0a90b8e6c0 > dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame > 0xfffffe0a90b8e700 > dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfffffe0a90b8e780 > dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfffffe0a90b8e7b0 > zfs_write() at zfs_write+0x672/frame 0xfffffe0a90b8e960 > zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfffffe0a90b8e980 > VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfffffe0a90b8ea90 > vn_write() at vn_write+0x325/frame 0xfffffe0a90b8eb20 > vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfffffe0a90b8eb80 > vn_io_fault1() at vn_io_fault1+0x161/frame 0xfffffe0a90b8ecc0 > vn_io_fault() at vn_io_fault+0x1b5/frame 0xfffffe0a90b8ed40 > dofilewrite() at dofilewrite+0x81/frame 0xfffffe0a90b8ed90 > sys_write() at sys_write+0xc0/frame 0xfffffe0a90b8ee00 > amd64_syscall() at amd64_syscall+0x157/frame 0xfffffe0a90b8ef30 > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0a90b8ef30 > --- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp = > 0x103cdc85dd48, rbp = 0x103cdc85dd80 --- > KDB: enter: panic > [ thread pid 95000 tid 135035 ] > Stopped at kdb_enter+0x32: movq $0,0x9e4153(%rip) > > The posted 14.0 schedule which plans to branch stable/14 on May 12 and > one cannot bet on the feature getting beaten up into production shape > by that time. Given whatever non-block_clonning and not even zfs bugs > which are likely to come out I think this makes the feature a > non-starter for said release. > > I note: > 1. the current problems did not make it into stable branches. > 2. there was block_cloning-related data corruption (fixed) and there may be more > 3. there was unrelated data corruption (see > https://github.com/openzfs/zfs/issues/14753), sorted out by reverting > the problematic commit in FreeBSD, not yet sorted out upstream > > As such people's data may be partially hosed as is. > > Consequently the proposed plan is as follows: > 1. whack the block cloning feature for the time being, but make sure > pools which upgraded to it can be mounted read-only > 2. run ztest and whatever other stress testing on FreeBSD, along with > restoring openzfs CI -- I can do the first part, I'm sure pho will not > mind to run some tests of his own > 3. recommend people create new pools and restore data from backup. if > restoring from backup is not an option, tar or cp (not zfs send) from > the read-only mount > > block cloning beaten into shape would use block_cloning_v2 or whatever > else, key point that the current feature name would be considered > bogus (not blocking RO import though) to prevent RW usage of the > current pools with it enabled. > > Comments? Correct me if I'm wrong, but from my understanding there were zero problems with block cloning when it wasn't in use or now disabled. The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly avoid mess like this and give us more time to sort all the problems out while making it easy for people to try it. If there is no plan to revert the whole import, I don't see what value removing just block cloning will bring if it is now disabled by default and didn't cause any problems when disabled. -- Pawel Jakub Dawidek