Re: Strange network/socket anomalies since about a month
Date: Wed, 24 Apr 2024 10:01:26 UTC
Am 2024-04-22 18:12, schrieb Gleb Smirnoff: > There were several preparatory commits that were not reverted and one > of them > had a bug. The bug manifested itself as failure to send(2) zero bytes > over > unix/stream. It was fixed with > e6a4b57239dafc6c944473326891d46d966c0264. Can > you please check you have this revision? Other than that there are no > known > bugs left. Yes, I have this fix in my running kernel. > A> Any ideas how to track this down more easily than running the entire > A> poudriere in ktrace (e.g. a hint/script which dtrace probes to use)? > > I don't have any better idea than ktrace over failing application. > Yep, I > understand that poudriere will produce a lot. But first we need to > determine Yes, it does. 4.4 GB just for the start of poudriere until the first package build fails due to a failed sccache start (luckily in the first builder, but I had at least 2 builders automatically spin up by poudriere at the time when I validated the failure in the logs and disabled the tracing). > what syscall fails and on what type of socket. After that we can scope > down to > using dtrace on very particular functions. I'm not sure I manage to find the cause of the failure... the only thing which remotely looks like an issue is "Resource temporarily unavailable", but this is from the process which waits for the server to have started: ---snip--- 58406 sccache 1713947887.504834367 RET __sysctl 0 58406 sccache 1713947887.505521884 CALL rfork(0x80000000<><invalid>2147483648) 58406 sccache 1713947887.505757777 CAP system call not allowed: rfork 58406 sccache 1713947887.505774176 RET rfork 58426/0xe43a 58406 sccache 1713947887.507304865 CALL compat11.kevent(0x3,0x371d360f89e8,0x2,0x371d360f89e8,0x2,0) 58406 sccache 1713947887.507657906 STRU struct freebsd11_kevent[] = { { ident=11, filter=EVFILT_READ, flags=0x61<EV_ADD|EV_CLEAR|EV_RECEIPT>, fflags=0, data=0, udata=0x0 } { ident=11, filter=EVFILT_WRITE, flags=0x61<EV_ADD|EV_CLEAR|EV_RECEIPT>, fflags=0, data=0, udata=0x0 } } 58406 sccache 1713947887.507689980 STRU struct freebsd11_kevent[] = { { ident=11, filter=EVFILT_READ, flags=0x4000<EV_ERROR>, fflags=0, data=0, udata=0x0 } { ident=11, filter=EVFILT_WRITE, flags=0x4000<EV_ERROR>, fflags=0, data=0, udata=0x0 } } 58406 sccache 1713947887.507977155 RET compat11.kevent 2 58406 sccache 1713947887.508015751 CALL write(0x5,0x371515685bcc,0x1) 58406 sccache 1713947887.508086434 GIO fd 5 wrote 1 byte 0x0000 01 |.| 58406 sccache 1713947887.508145930 RET write 1 58406 sccache 1713947887.508183140 CALL compat11.kevent(0x7,0,0,0x5a5689ab0c40,0x400,0) 58406 sccache 1713947887.508396614 STRU struct freebsd11_kevent[] = { } 58406 sccache 1713947887.508156537 STRU struct freebsd11_kevent[] = { { ident=4, filter=EVFILT_READ, flags=0x60<EV_CLEAR|EV_RECEIPT>, fflags=0, data=0x1, udata=0xffffffffffffffff } } 58406 sccache 1713947887.508530888 RET compat11.kevent 1 58406 sccache 1713947887.508563736 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947887.508729102 GIO fd 4 read 1 byte 0x0000 01 |.| 58406 sccache 1713947887.508967661 RET read 1 58406 sccache 1713947887.508996753 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947887.509028311 RET read -1 errno 35 Resource temporarily unavailable 58406 sccache 1713947887.509068838 CALL compat11.kevent(0x3,0,0,0x5a5689a97540,0x400,0x371d3a2887c8) ... 58406 sccache 1713947897.514352552 CALL _umtx_op(0x5a5689a3d290,0x10<UMTX_OP_WAKE_PRIVATE>,0x7fffffff,0,0) 58406 sccache 1713947897.514383653 RET _umtx_op 0 58406 sccache 1713947897.514421273 CALL write(0x5,0x371515685bcc,0x1) 58406 sccache 1713947897.515050967 STRU struct freebsd11_kevent[] = { { ident=4, filter=EVFILT_READ, flags=0x60<EV_CLEAR|EV_RECEIPT>, fflags=0, data=0x1, udata=0xffffffffffffffff } } 58406 sccache 1713947897.515146151 RET compat11.kevent 1 58406 sccache 1713947897.515178978 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947897.515368070 GIO fd 4 read 1 byte 0x0000 01 |.| 58406 sccache 1713947897.515396600 RET read 1 58406 sccache 1713947897.515426523 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947897.515457073 RET read -1 errno 35 Resource temporarily unavailable 58406 sccache 1713947897.515004494 GIO fd 5 wrote 1 byte 0x0000 01 |.| ---snip--- https://www.leidinger.net/test/sccache.tar.bz2 contains the parts of the trace of the sccache processes (in case someone wants to have a look). The poudriere run had several builders in parallel, at least 2 were running at that point in time. What the overlay does is to startup (sccache --start-server) the sccache server process (forks to return back on the command line) which creates a file system socket, and then it queries the stats (sccache --show-stats). So some of the traces in the tarball are the server start (those with "Timed out waiting for server startup" are maybe the processes which fork to start the server and wait for it to be started), some are the stat-query, and some seem to be a successful start in another poudriere-builder (those with a successful /root/.ccache/sccache/5/4/<hashvalue> access look like from a successful start in another jail). Maybe there is also a --stop-server from poudriere somewhere. What I noticed (except that printing the new CAP stuff for non-CAP enabled processes by default is disturbing) is, that compat11 stuff is called (seems the rust ecosystem is not keeping up with our speed of development...). Not sure if it matters here that some compat stuff is called. Bye, Alexander. -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF