Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))
- Reply: Guido Falsi : "Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))"
- In reply to: Nathan Reilly-list : "Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 29 Jan 2024 01:10:45 UTC
On Sun, Jan 28, 2024 at 4:45 PM Nathan Reilly-list <lists@nreilly.com> wrote: > > > On 29 Jan 2024, at 8:43 am, Guido Falsi <mad@madpilot.net> wrote: > On 28/01/24 22:34, Guido Falsi wrote: > > On 28/01/24 22:23, Warner Losh wrote: > > On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <mad@madpilot.net <mailto: > mad@madpilot.net>> wrote: > > On 28/01/24 15:15, Guido Falsi wrote: > [snip] > > Creating repository in /tmp/packages: 0% > > > > BTW, forgot to mention last time this worked without issue was around > 20th December. > > > I think this is a bsd-user issue. There is a race somewhere in that code > that causes the hangs. I'd love a reproducible test case that is somewhat > smaller than python... there are bigger races with the newer stuff and I've > not had the time to chase it there either. 😞 > > First of all thanks for your feedback. It encourages me having someone > else with better knowledge about this confirm that a race condition is > actually a possible cause! > Strange this has not been happening up to mid December. > My main and fully reproducible use case is actually mostly with pkg. > at the end of the run poudriere runs `pkg repo` to create the meta files > and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it to 1 > worker I see three processes), and then locks up, with all the processes > stopping using CPU (ps output is in my message) > I guess this can be reproduced with any poudriere repo with at least more > than ncpus packages in it. can also be reproduced using `poudriere pkgclean > -u <etc>` > If that does not work I'm not sure how to reproduce it in other ways, but > I can try writing some code mocking what pkg seems to be doing, not an > expert at such things, though. > > > In case it helps further norrow doen things, It looks like the lockup is > happening somewhere around here: > > > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 > > and/or in the pkg_create_repo_worker() function here: > > > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 > > > (I'm trying to spare you the time needed to find the actual code being > executed, I guess you would have identified this in a few minutes yourself, > but I'm trying to make myself useful) > > > > There appears to be a GitHub issue for poudriere with this, but seems to > be looking in another direction. > > https://github.com/freebsd/poudriere/issues/1009 > There's a FreeBSD bug saying this is happening w/o qemu in the loop. https://bugs.freebsd.org/276690 at least I think that's similar. Warner