another crash and going forward with zfs
- Reply: Pawel Jakub Dawidek : "Re: another crash and going forward with zfs"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 17 Apr 2023 18:51:05 UTC
After bugfixes got committed I decided to zpool upgrade and sysctl vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very quickly got a new crash: panic: VERIFY(arc_released(db->db_buf)) failed cpuid = 9 time = 1681755046 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0a90b8e5f0 vpanic() at vpanic+0x152/frame 0xfffffe0a90b8e640 spl_panic() at spl_panic+0x3a/frame 0xfffffe0a90b8e6a0 dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfffffe0a90b8e6c0 dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame 0xfffffe0a90b8e700 dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfffffe0a90b8e780 dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfffffe0a90b8e7b0 zfs_write() at zfs_write+0x672/frame 0xfffffe0a90b8e960 zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfffffe0a90b8e980 VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfffffe0a90b8ea90 vn_write() at vn_write+0x325/frame 0xfffffe0a90b8eb20 vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfffffe0a90b8eb80 vn_io_fault1() at vn_io_fault1+0x161/frame 0xfffffe0a90b8ecc0 vn_io_fault() at vn_io_fault+0x1b5/frame 0xfffffe0a90b8ed40 dofilewrite() at dofilewrite+0x81/frame 0xfffffe0a90b8ed90 sys_write() at sys_write+0xc0/frame 0xfffffe0a90b8ee00 amd64_syscall() at amd64_syscall+0x157/frame 0xfffffe0a90b8ef30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0a90b8ef30 --- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp = 0x103cdc85dd48, rbp = 0x103cdc85dd80 --- KDB: enter: panic [ thread pid 95000 tid 135035 ] Stopped at kdb_enter+0x32: movq $0,0x9e4153(%rip) The posted 14.0 schedule which plans to branch stable/14 on May 12 and one cannot bet on the feature getting beaten up into production shape by that time. Given whatever non-block_clonning and not even zfs bugs which are likely to come out I think this makes the feature a non-starter for said release. I note: 1. the current problems did not make it into stable branches. 2. there was block_cloning-related data corruption (fixed) and there may be more 3. there was unrelated data corruption (see https://github.com/openzfs/zfs/issues/14753), sorted out by reverting the problematic commit in FreeBSD, not yet sorted out upstream As such people's data may be partially hosed as is. Consequently the proposed plan is as follows: 1. whack the block cloning feature for the time being, but make sure pools which upgraded to it can be mounted read-only 2. run ztest and whatever other stress testing on FreeBSD, along with restoring openzfs CI -- I can do the first part, I'm sure pho will not mind to run some tests of his own 3. recommend people create new pools and restore data from backup. if restoring from backup is not an option, tar or cp (not zfs send) from the read-only mount block cloning beaten into shape would use block_cloning_v2 or whatever else, key point that the current feature name would be considered bogus (not blocking RO import though) to prevent RW usage of the current pools with it enabled. Comments? -- Mateusz Guzik <mjguzik gmail.com>