Re: Hang ast / pipelk / piperd

From: Mark Johnston <markj_at_freebsd.org>
Date: Mon, 06 Jun 2022 18:22:25 UTC
On Thu, Jun 02, 2022 at 12:49:45PM +0200, Jan Mikkelsen wrote:
> All these mi_switch+0xc2 hangs reminded me of something I saw once on 13.1-RC2 back in April. The machine was running five concurrent “make -j32 installword” processes.
> 
> The machine hung, disk activity stopped. Results of ^T on various running commands:
> 
> ^T on a “tail -F” command:
> 
> load: 1.93  cmd: tail 27541 [zfs teardown inactive] 393.65r 0.06u 0.10s 0% 2548k
> mi_switch+0xc2 _sleep+0x1fc rms_rlock_fallback+0x90 zfs_freebsd_reclaim+0x26 VOP_RECLAIM_APV+0x1f vgonel+0x342 vnlru_free_impl+0x2f7 vn_alloc_hard+0xc8 getnewvnode_reserve+0x93 zfs_zget+0x22 zfs_dirent_lookup+0x16b zfs_dirlook+0x7a zfs_lookup+0x3d0 zfs_cache_lookup+0xa9 VOP_LOOKUP+0x30 cache_fplookup_noentry+0x1a3 cache_fplookup+0x366 namei+0x12a 
> 
> ^T on a zsh doing a cd to a UFS directory:
> 
> load: 0.48  cmd: zsh 86937 [zfs teardown inactive] 84663.01r 0.06u 0.01s 0% 6412k
> mi_switch+0xc2 _sleep+0x1fc rms_rlock_fallback+0x90 zfs_freebsd_reclaim+0x26 VOP_RECLAIM_APV+0x1f vgonel+0x342 vnlru_free_impl+0x2f7 vn_alloc_hard+0xc8 getnewvnode_reserve+0x93 zfs_zget+0x22 zfs_dirent_lookup+0x16b zfs_dirlook+0x7a zfs_lookup+0x3d0 zfs_cache_lookup+0xa9 lookup+0x45c namei+0x259 kern_statat+0xf3 sys_fstatat+0x2f 

This looks very similar to the problem described here:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261448
Though, in my case I did not see any deadlocks.  In other words, the
hang always ended after some time (typically a few seconds).

> ^T on an attempt to start gstat
> 
> load: 0.17  cmd: gstat 63307 [ufs] 298.29r 0.00u 0.00s 0% 228k
> mi_switch+0xc2 sleeplk+0xf6 lockmgr_slock_hard+0x3e7 ffs_lock+0x6c _vn_lock+0x48 vget_finish+0x21 cache_lookup+0x26c vfs_cache_lookup+0x7b lookup+0x45c namei+0x259 vn_open_cred+0x533 kern_openat+0x283 amd64_syscall+0x10c fast_syscall_common+0xf8 
> 
> A short press of the system power button did nothing.
> 
> The installworld target directories were on a ZFS filesystem with a single mirror of two SATA SSDs.
> 
> Unsure if it’s related because the rest of the stack traces are different. However, the mi_switch+0xc2 triggered a memory.

mi_switch() is main entry point into the CPU scheduler, so pretty much
any thread which isn't on a CPU will have mi_switch() appear in its
backtrace.