ZFS pool hangs (live-locks?) after adding L2ARC
Date: Wed, 20 Dec 2023 13:31:15 UTC
Hello! System in question is FreeBSD 13.2-STABLE stable/13-n256849-05c55eed44e5. I have 3 ZFS pools: one "simple" (nda1p2) (it is system pool with BE, root, etc), and two radiz1 pools: "zstor", consisting of 5 HDDs (daX) and "ztorr", consisting of 3 HDDs (adaX). Also, I have NVMe disk nvme0 (nda0, it is brand new AData Legend 960 2TB) with 1 GPT partition of type "freebsd-swap" (it is NOT configured or enabled as swap in the system!). Size of this partition is 1.6T. When I try to add nda0p1 (AData partition) as "cache" to "zstor" pool it is added without problem, but later pool hangs. I've experienced 2 hangs: (1) Right after adding cache and reboot import of pool hangs. When I tried to import pool by hands in single-user mode, I've seen that one kernel thread with name like "z_int_2_2" consume 100% of one core. I've waited for one hour without any result. After that I've removed NMVe physically, booted successfully and removed it with "zpool remove". (2) After that I've re-added "cache" device and everything worked for some time (10+ days). But suddenly one filesystem on the pool (only one!) starts to livelock: if you do "ls" on this filesystem it hangs forever, "ls" consume one core (100%) in system and again thread with name like "z_int_X_Y" consumes 100% of other core. "ls" could not be killed, only reboot (which hangs too after "all bufs synced"!) helps. But after reboot it reproduced again, with exactly same symptoms. This time I was able to remove chache device with "zpool remove", without detaching it physically. Status of pool: > zpool status zstor pool: zstor state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: resilvered 1.85T in 04:02:19 with 0 errors on Sat Dec 9 16:21:33 2023 config: NAME STATE READ WRITE CKSUM zstor ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da4 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da1 ONLINE 0 0 0 da0 ONLINE 0 0 0 errors: No known data errors I have two non-default settings for zfs: vfs.zfs.min_auto_ashift=12 vfs.zfs.abd_scatter_enabled=0 I can not find any discussion about such problem on Internet. Also, "live" system doesn't have these "z_int_X_Y" threads at all. I want my L2ARC, I've payed for this NVMe! -- // Lev Serebryakov