Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling))

From: Nathan Reilly-list <lists_at_nreilly.com>
Date: Sun, 28 Jan 2024 23:45:19 UTC

> On 29 Jan 2024, at 8:43 am, Guido Falsi <mad@madpilot.net> wrote:
> On 28/01/24 22:34, Guido Falsi wrote:
>> On 28/01/24 22:23, Warner Losh wrote:
>>> 
>>> On Sun, Jan 28, 2024, 12:38 PM Guido Falsi <mad@madpilot.net <mailto:mad@madpilot.net>> wrote:
>>> 
>>>     On 28/01/24 15:15, Guido Falsi wrote:
>>>     [snip]
>>>      > Creating repository in /tmp/packages:   0%
>>>      >
>>> 
>>>     BTW, forgot to mention last time this worked without issue was around
>>>     20th December.
>>> 
>>> 
>>> I think this is a bsd-user issue. There is a race somewhere in that code that causes the hangs. I'd love a reproducible test case that is somewhat smaller than python... there are bigger races with the newer stuff and I've not had the time to chase it there either. 😞
>> First of all thanks for your feedback. It encourages me having someone else with better knowledge about this confirm that a race condition is actually a possible cause!
>> Strange this has not been happening up to mid December.
>> My main and fully reproducible use case is actually mostly with pkg.
>> at the end of the run poudriere runs `pkg repo` to create the meta files and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it to 1 worker I see three processes), and then locks up, with all the processes stopping using CPU (ps output is in my message)
>> I guess this can be reproduced with any poudriere repo with at least more than ncpus packages in it. can also be reproduced using `poudriere pkgclean -u <etc>`
>> If that does not work I'm not sure how to reproduce it in other ways, but I can try  writing some code mocking what pkg seems to be doing, not an expert at such things, though.
> 
> In case it helps further norrow doen things, It looks like the lockup is happening somewhere around here:
> 
> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L778
> 
> and/or in the pkg_create_repo_worker() function here:
> 
> https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341
> 
> 
> (I'm trying to spare you the time needed to find the actual code being executed, I guess you would have identified this in a few minutes yourself, but I'm trying to make myself useful)


There appears to be a GitHub issue for poudriere with this, but seems to be looking in another direction.

https://github.com/freebsd/poudriere/issues/1009

Regards,
Nathan