Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))
Date: Mon, 29 Jan 2024 16:02:36 UTC
On 29/01/24 16:53, Warner Losh wrote: > > > On Mon, Jan 29, 2024, 8:48 AM Guido Falsi <mad@madpilot.net > <mailto:mad@madpilot.net>> wrote: > > On 29/01/24 09:26, Guido Falsi wrote: > > On 29/01/24 02:10, Warner Losh wrote: > >> > >> > >> On Sun, Jan 28, 2024 at 4:45 PM Nathan Reilly-list > <lists@nreilly.com <mailto:lists@nreilly.com> > >> <mailto:lists@nreilly.com <mailto:lists@nreilly.com>>> wrote: > >> > >> > >> > >>> On 29 Jan 2024, at 8:43 am, Guido Falsi <mad@madpilot.net > <mailto:mad@madpilot.net> > >>> <mailto:mad@madpilot.net <mailto:mad@madpilot.net>>> wrote: > >>> On 28/01/24 22:34, Guido Falsi wrote: > >>>> On 28/01/24 22:23, Warner Losh wrote: > >>>>> On Sun, Jan 28, 2024, 12:38 PM Guido Falsi > <mad@madpilot.net <mailto:mad@madpilot.net> > >>>>> <mailto:mad@madpilot.net <mailto:mad@madpilot.net>> > <mailto:mad@madpilot.net <mailto:mad@madpilot.net> > >>>>> <mailto:mad@madpilot.net <mailto:mad@madpilot.net>>>> wrote: > >>>>> > >>>>> On 28/01/24 15:15, Guido Falsi wrote: > >>>>> [snip] > >>>>> > Creating repository in /tmp/packages: 0% > >>>>> > > >>>>> > >>>>> BTW, forgot to mention last time this worked without > issue > >>>>> was around > >>>>> 20th December. > >>>>> > >>>>> > >>>>> I think this is a bsd-user issue. There is a race > somewhere in > >>>>> that code that causes the hangs. I'd love a reproducible test > >>>>> case that is somewhat smaller than python... there are bigger > >>>>> races with the newer stuff and I've not had the time to > chase it > >>>>> there either. 😞 > >>>> First of all thanks for your feedback. It encourages me having > >>>> someone else with better knowledge about this confirm that > a race > >>>> condition is actually a possible cause! > >>>> Strange this has not been happening up to mid December. > >>>> My main and fully reproducible use case is actually mostly > with > >>>> pkg. > >>>> at the end of the run poudriere runs `pkg repo` to create the > >>>> meta files and sign the repo. It forks itself (ncpus + 2 I > guess, > >>>> even forcing it to 1 worker I see three processes), and then > >>>> locks up, with all the processes stopping using CPU (ps > output is > >>>> in my message) > >>>> I guess this can be reproduced with any poudriere repo with at > >>>> least more than ncpus packages in it. can also be reproduced > >>>> using `poudriere pkgclean -u <etc>` > >>>> If that does not work I'm not sure how to reproduce it in > other > >>>> ways, but I can try writing some code mocking what pkg > seems to > >>>> be doing, not an expert at such things, though. > >>> > >>> In case it helps further norrow doen things, It looks like the > >>> lockup is happening somewhere around here: > >>> > >>> > >>> > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778> <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778>> > >>> > >>> and/or in the pkg_create_repo_worker() function here: > >>> > >>> > >>> > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341> <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341 <https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341>> > >>> > >>> > >>> (I'm trying to spare you the time needed to find the actual > code > >>> being executed, I guess you would have identified this in a few > >>> minutes yourself, but I'm trying to make myself useful) > >> > >> > >> There appears to be a GitHub issue for poudriere with this, but > >> seems to be looking in another direction. > >> > >> https://github.com/freebsd/poudriere/issues/1009 > <https://github.com/freebsd/poudriere/issues/1009> > >> <https://github.com/freebsd/poudriere/issues/1009 > <https://github.com/freebsd/poudriere/issues/1009>> > >> > > > > This one looks quite similar. > > > > In my case the ports/pkg are aligned between host and jail, in > fact I > > have built them from the exact same git checkout. > > > > I noticed pkg head has been converted to using pthreads instead > of fork, > > maybe that could help. I will make time to perform some testing. > > Thanks for pointing me here, it looks like this was "it", in that by > fixing this issue it uses native pkg-static, and sidesteps the issue. > > > Unluckily there ARE qemu races and lockups that prevent arm64 > pkg-static > binary to be correctly emulated by qemu-user-static. such conditions > also cause sporadic failures in some ports being built. > > I filed a PR with a fix for that issue: > > https://github.com/freebsd/poudriere/pull/1115 > <https://github.com/freebsd/poudriere/pull/1115> > > > Ok. This dodges the problem. But it papers over things. Definitely, but this is actually also what was happening in the past. It stopped using native (host) pkg-static due to the pkg port gaining a PORTREVISION, which caused the same version check to fail. I agree the underlying issue should be fixed. > > Any chance you could give me the state of pkg before + the package added > as a test case for qemu? Not sure I understand what you are asking for, can you elaborate? What I did was run poudriere asking it to compile a few packages, the lockup, when trying to use target arch pkg-static via qemu-user, is reproducible 100% in my experience. It does not really depend on number of packages. I get it by starting with an empty build. I'm building these packages (and obviously their dependencies): dns/unbound net/kea sysutils/tmux (I guess building only tmux could suffice) With poudriere you can get it to use target arch pkg-static by modifying /usr/local/share/poudriere/common.sh function ensure_pkg_installed, making sure the check here fails: https://github.com/freebsd/poudriere/blob/e00503d846dc7a3b661aac84a6657f15e0f4b702/src/share/poudriere/common.sh#L5687 -- Guido Falsi <mad@madpilot.net>