Re: Hang ast / pipelk / piperd
- Reply: Mark Johnston : "Re: Hang ast / pipelk / piperd"
- In reply to: Paul Floyd : "Re: Hang ast / pipelk / piperd"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 02 Jun 2022 10:49:45 UTC
> On 1 Jun 2022, at 23:27, Paul Floyd <paulf2718@gmail.com> wrote: > > > > On 6/1/22 14:16, Mark Johnston wrote: >> On Mon, May 30, 2022 at 09:35:05PM +0200, Paul Floyd wrote: >>> >>> >>> On 5/30/22 14:15, Mark Johnston wrote: >>> >>>> "procstat -kk <valgrind PID>" might help to reveal what's going on, >>>> since it sounds like the hand/livelock is happening somewhere in the >>>> kernel. >>> >>> Not knowing much about the kernel, my guess is that this is related to >>> >>> commit 4808bab7fa6c3ec49b49476b8326d7a0250a03fa >>> Author: Alexander Motin <mav@FreeBSD.org> >>> Date: Tue Sep 21 18:14:22 2021 -0400 >>> >>> sched_ule(4): Improve long-term load balancer. >>> >>> and this bit of ast code >>> >>> doreti_ast: >>> /* >>> * Check for ASTs atomically with returning. Disabling CPU >>> * interrupts provides sufficient locking even in the SMP case, >>> * since we will be informed of any new ASTs by an IPI. >>> */ >>> cli >>> movq PCPU(CURTHREAD),%rax >>> testl $TDF_ASTPENDING | TDF_NEEDRESCHED,TD_FLAGS(%rax) >>> je doreti_exit >>> sti >>> movq %rsp,%rdi /* pass a pointer to the trapframe */ >>> call ast >>> jmp doreti_ast >>> >>> >>> The above commit seems to be migrating loaded threads to another CPU. >> How did you infer that? The long-term load balancer should be running >> fairly infrequently. > > > Well one thread is hung in > mi_switch+0xc2 ast+0x1e6 doreti_ast+0x1f > > I found the above code referring to doreti_ast with grep and thought that if the test always fails then it will loop forever. So I looked for git commits containing TDF_ASTPENDING and TDF_NEEDRESCHED and found a couple of commits mentioning these flags, scheduling and found the above and the followup > > commit 11f14b33629e552a451fdbfe653ebb0addd27700 > Author: Alexander Motin <mav@FreeBSD.org> > Date: Sun Sep 26 12:03:05 2021 -0400 > > sched_ule(4): Fix hang with steal_thresh < 2. > > e745d729be60 caused infinite loop with interrupts disabled in load > stealing code if steal_thresh set below 2. Such configuration should > not generally be used, but appeared some people are using it to > workaround some problems. > > and I guessed they might be related. > >> As a side note, I think we are missing ktrcsw() calls in some places, >> e.g., in turnstile_wait(). >>> My test system is a VirtualBox amd64 FreeBSD 13.1 with one CPU running >>> on a 13.0 host. >>> >>> I just tried restarting the VM with 2 CPUs and the testcase seems to be >>> a lot better - it's been running in a loop for 10 minutes whereas >>> previously it would hang at least 1 in 5 times. >> Hmm. Could you, please, show the ktrace output with -H -T passed to >> kdump(1), together with fresh "procstat -kk" output? >> The fact that the problem apparently only occurs with 1 CPU suggests a >> scheduler bug, indeed. > > > Here is the fresh procstat output > > paulf@freebsd:~ $ procstat -kk 12339 > PID TID COMM TDNAME KSTACK > 12339 100089 none-amd64-freebsd - mi_switch+0xc2 sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 umtxq_sleep+0x143 do_wait+0x3e5 __umtx_op_wait+0x53 sys__umtx_op+0x7e amd64_syscall+0x10c fast_syscall_common+0xf8 > > 12339 100582 none-amd64-freebsd - mi_switch+0xc2 sleepq_catch_signals+0x2e6 sleepq_timedwait_sig+0x12 _sleep+0x1d1 kern_clock_nanosleep+0x1c1 sys_nanosleep+0x3b amd64_syscall+0x10c fast_syscall_common+0xf8 > > 12339 100583 none-amd64-freebsd - mi_switch+0xc2 ast+0x1e6 doreti_ast+0x1f > > 12339 100584 none-amd64-freebsd - mi_switch+0xc2 sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 pipe_read+0x3d6 dofileread+0x81 sys_read+0xbc amd64_syscall+0x10c fast_syscall_common+0xf8 > > 12339 100585 none-amd64-freebsd - mi_switch+0xc2 intr_event_handle+0x167 intr_execute_handlers+0x4b Xapic_isr1+0xdc sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1f2 pipe_read+0x3d6 dofileread+0x81 sys_read+0xbc amd64_syscall+0x10c fast_syscall_common+0xf8 > [ … ] All these mi_switch+0xc2 hangs reminded me of something I saw once on 13.1-RC2 back in April. The machine was running five concurrent “make -j32 installword” processes. The machine hung, disk activity stopped. Results of ^T on various running commands: ^T on a “tail -F” command: load: 1.93 cmd: tail 27541 [zfs teardown inactive] 393.65r 0.06u 0.10s 0% 2548k mi_switch+0xc2 _sleep+0x1fc rms_rlock_fallback+0x90 zfs_freebsd_reclaim+0x26 VOP_RECLAIM_APV+0x1f vgonel+0x342 vnlru_free_impl+0x2f7 vn_alloc_hard+0xc8 getnewvnode_reserve+0x93 zfs_zget+0x22 zfs_dirent_lookup+0x16b zfs_dirlook+0x7a zfs_lookup+0x3d0 zfs_cache_lookup+0xa9 VOP_LOOKUP+0x30 cache_fplookup_noentry+0x1a3 cache_fplookup+0x366 namei+0x12a ^T on a zsh doing a cd to a UFS directory: load: 0.48 cmd: zsh 86937 [zfs teardown inactive] 84663.01r 0.06u 0.01s 0% 6412k mi_switch+0xc2 _sleep+0x1fc rms_rlock_fallback+0x90 zfs_freebsd_reclaim+0x26 VOP_RECLAIM_APV+0x1f vgonel+0x342 vnlru_free_impl+0x2f7 vn_alloc_hard+0xc8 getnewvnode_reserve+0x93 zfs_zget+0x22 zfs_dirent_lookup+0x16b zfs_dirlook+0x7a zfs_lookup+0x3d0 zfs_cache_lookup+0xa9 lookup+0x45c namei+0x259 kern_statat+0xf3 sys_fstatat+0x2f ^T on an attempt to start gstat load: 0.17 cmd: gstat 63307 [ufs] 298.29r 0.00u 0.00s 0% 228k mi_switch+0xc2 sleeplk+0xf6 lockmgr_slock_hard+0x3e7 ffs_lock+0x6c _vn_lock+0x48 vget_finish+0x21 cache_lookup+0x26c vfs_cache_lookup+0x7b lookup+0x45c namei+0x259 vn_open_cred+0x533 kern_openat+0x283 amd64_syscall+0x10c fast_syscall_common+0xf8 A short press of the system power button did nothing. The installworld target directories were on a ZFS filesystem with a single mirror of two SATA SSDs. Unsure if it’s related because the rest of the stack traces are different. However, the mi_switch+0xc2 triggered a memory. Regards, Jan M.