Re: Hang ast / pipelk / piperd
- Reply: Paul Floyd : "Re: Hang ast / pipelk / piperd"
- In reply to: Paul Floyd : "Re: Hang ast / pipelk / piperd"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 01 Jun 2022 14:16:46 UTC
On Mon, May 30, 2022 at 09:35:05PM +0200, Paul Floyd wrote: > > > On 5/30/22 14:15, Mark Johnston wrote: > > > "procstat -kk <valgrind PID>" might help to reveal what's going on, > > since it sounds like the hand/livelock is happening somewhere in the > > kernel. > > Not knowing much about the kernel, my guess is that this is related to > > commit 4808bab7fa6c3ec49b49476b8326d7a0250a03fa > Author: Alexander Motin <mav@FreeBSD.org> > Date: Tue Sep 21 18:14:22 2021 -0400 > > sched_ule(4): Improve long-term load balancer. > > and this bit of ast code > > doreti_ast: > /* > * Check for ASTs atomically with returning. Disabling CPU > * interrupts provides sufficient locking even in the SMP case, > * since we will be informed of any new ASTs by an IPI. > */ > cli > movq PCPU(CURTHREAD),%rax > testl $TDF_ASTPENDING | TDF_NEEDRESCHED,TD_FLAGS(%rax) > je doreti_exit > sti > movq %rsp,%rdi /* pass a pointer to the trapframe */ > call ast > jmp doreti_ast > > > The above commit seems to be migrating loaded threads to another CPU. How did you infer that? The long-term load balancer should be running fairly infrequently. As a side note, I think we are missing ktrcsw() calls in some places, e.g., in turnstile_wait(). > My test system is a VirtualBox amd64 FreeBSD 13.1 with one CPU running > on a 13.0 host. > > I just tried restarting the VM with 2 CPUs and the testcase seems to be > a lot better - it's been running in a loop for 10 minutes whereas > previously it would hang at least 1 in 5 times. Hmm. Could you, please, show the ktrace output with -H -T passed to kdump(1), together with fresh "procstat -kk" output? The fact that the problem apparently only occurs with 1 CPU suggests a scheduler bug, indeed.