vnode_init took over 6 minutes
- Reply: Mateusz Guzik : "Re: vnode_init took over 6 minutes"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 31 Oct 2023 13:57:34 UTC
During a poudriere build control-T showed an awk process was stuck for over 6 minutes trying to acquire a lock in vnode_init. load: 8.58 cmd: awk 51238 [running] 392.88r 0.00u 392.88s 100% 2604k __mtx_lock_sleep+0xf8 __mtx_lock_flags+0xa4 vnode_init+0xc3 keg_alloc_slab+0x277 zone_import+0x143 cache_alloc+0x3ed cache_alloc_retry+0x23 getnewvnode_reserve+0x20 zfs_zget+0x1f zfs_dirent_lookup+0x16d zfs_dirlook+0x7f zfs_lookup+0x3e0 zfs_freebsd_cachedlookup+0x74 vfs_cache_lookup+0xa7 cache_fplookup_noentry+0x241 cache_fplookup+0x575 namei+0x1ea vn_open_cred+0x48d The stack trace was the same for several minutes. System CPU time was increasing. Address vnode_init+0xc3 corresponds to the mtx_lock call here. vp->v_holdcnt = VHOLD_NO_SMR; vp->v_type = VNON; mtx_lock(&vnode_list_mtx); TAILQ_INSERT_BEFORE(vnode_list_free_marker, vp, v_vnodelist); mtx_unlock(&vnode_list_mtx); return (0); Address __mtx_lock_sleep+0xf8 is the instruction after a call to lock_delay. ps says the command line was /usr/bin/awk -f /usr/bin/awk old-default/2023-10-31_08h21m03s/.poudriere.builders old-default/2023-10-31_08h21m03s/.poudriere.buildname ... with the full list of input files exceeding the ~2KB command line length limit of ps. "/usr/bin/awk" is probably not the real second argument. It would cause an immediate syntax error. The hang resolved within a few more minutes and poudriere is continuing happily. I have never seen such behavior before. Code in vfs_subr.c tries not to hold the vnode_list_mtx lock for too long. Any thoughts? FreeBSD 13.2-STABLE up through commit b180f0040f95, 24 core 48 thread Zen 2, zfs pool on SSD connected via 12 Gbps SAS with cache on NVME, 160 GB RAM.