Re: Hang ast / pipelk / piperd
- Reply: Floyd, Paul: "Re: Hang ast / pipelk / piperd"
- Reply: Paul Floyd : "Re: Hang ast / pipelk / piperd"
- Reply: Paul Floyd : "Re: Hang ast / pipelk / piperd"
- In reply to: Paul Floyd : "Re: Hang ast / pipelk / piperd"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 30 May 2022 14:15:43 UTC
On Mon, May 30, 2022 at 12:19:15AM +0200, Paul Floyd wrote: > > On 5/27/22 22:13, Paul Floyd wrote: > > > > Hi > > > > I'm debugging two issues with Valgrind on FreeBSD 13.1 and 14, one on > > amd64 and one on i386. > > > ... > > |Both hangs seem quite sensitive to timing - in both cases adding or > > changing nanosleep times seem to make them no longer hang. | > > |Adding debug statements to Valgrind can also change the behaviour > > (and is also unsafe when not holding the scheduler lock). Does this > > look like a kernel bug? | > > [...] > > Under gdb I see (and this is quite variable) > > (gdb) info thread > Id Target Id Frame > * 1 LWP 100073 of process 861 vgModuleLocal_do_syscall_for_client_WRK > () at m_syswrap/syscall-amd64-freebsd.S:135 > 2 LWP 100215 of process 861 > vgModuleLocal_do_syscall_for_client_WRK () at > m_syswrap/syscall-amd64-freebsd.S:135 > 3 LWP 100216 of process 861 0x00000000380bffac in do_syscall_WRK () > 4 LWP 100217 of process 861 0x00000000380bffac in do_syscall_WRK () > 5 LWP 100218 of process 861 0x00000000380bffac in do_syscall_WRK () > 6 LWP 100219 of process 861 0x00000000380bffac in do_syscall_WRK () > 7 LWP 100220 of process 861 0x00000000380bffac in do_syscall_WRK () > 8 LWP 100221 of process 861 0x00000000380bffac in do_syscall_WRK () > 9 LWP 100222 of process 861 0x00000000380bffac in do_syscall_WRK () > 10 LWP 100223 of process 861 0x00000000380bffac in do_syscall_WRK () > 11 LWP 100224 of process 861 0x00000000380bffac in do_syscall_WRK () > 12 LWP 100225 of process 861 0x00000000380bffac in do_syscall_WRK () > 13 LWP 100226 of process 861 0x00000000380bffac in do_syscall_WRK () > 14 LWP 100227 of process 861 0x00000000380bffac in do_syscall_WRK () > 15 LWP 100228 of process 861 0x00000000380bffac in do_syscall_WRK () > > do_syscall_WRK is the syscall interface for the Valgrind host, and that > will be the threads waiting for the lock. > > Thread 1 and 2 are in do_syscall_for_client, the interface for guest > syscalls. Thread 1 is doing a _umtx_op syscall, for the pthread_join. > Thrread 2 is doing a nanosleep. These are blocking syscalls so they > release the lock before making the syscall to allow other threads to > execute. > > I think that in the snapshot above, the lock is released and one > of threads 3 to 15 should be obtaining the lock and running. > > That's where the kernel context switch / AST seems to be going wrong. > > I don't see anything particularly wrong on the Valgrind side. > > Any ideas what I can do to see why the context switch is hanging? "procstat -kk <valgrind PID>" might help to reveal what's going on, since it sounds like the hand/livelock is happening somewhere in the kernel.