Hang ast / pipelk / piperd
- Reply: Paul Floyd : "Re: Hang ast / pipelk / piperd"
- Reply: Paul Floyd : "Re: Hang ast / pipelk / piperd"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 27 May 2022 22:13:52 UTC
Hi I'm debugging two issues with Valgrind on FreeBSD 13.1 and 14, one on amd64 and one on i386. The 1st testcase, on i386, creates 10 threads that all just then call pause(). Then there is a fork(), the parent does a pause() and the child kills the parent(). The error is reproducible. The second testcase, on amd64, runs a loop for 7 tests, each one creating 2 threads. The thread function writes either to a global variable or various types of TLS, using a nanosleep as a way to yeild between the threads. This hang is intermittent. The above detail is probably not that relevant. In both examples Valgrind is hanging with 100% CPU use. In ktrace where things seem to go wrong there is |9340 none-amd64-freebsd GIO fd 28503 read 1 byte "X" 9340 none-amd64-freebsd RET read 1 9340 none-amd64-freebsd CSW stop user "ast" 9340 none-amd64-freebsd CSW resume kernel "pipelk" 9340 none-amd64-freebsd CSW stop kernel "piperd" 9340 none-amd64-freebsd CSW resume kernel "pipelk" 9340 none-amd64-freebsd CSW stop kernel "piperd" ... repeat until killed That read is a pipe used for the Valgrind scheduler lock. The scheduler runs single threaded, and the read above means that one thread has acquired the lock and should be able to run. Instead it looks like there is an ast that gets the kernel stuck in context switches to pipe read and pipe lock states. kill -9 is the only way out. This all worked OK from FreeBSD 11.3 to 13.0. It's quite difficult to trace this within Valgrind. Both hangs seem quite sensitive to timing - in both cases adding or changing nanosleep times seem to make them no longer hang. Adding debug statements to Valgrind can also change the behaviour (and is also unsafe when not holding the scheduler lock). Does this look like a kernel bug? A+ Paul |