From nobody Mon Jan 29 01:10:45 2024 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TNVc43tr8z57jRD for ; Mon, 29 Jan 2024 01:11:00 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TNVc36sgMz4syf for ; Mon, 29 Jan 2024 01:10:59 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-554fe147ddeso1907240a12.3 for ; Sun, 28 Jan 2024 17:10:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1706490657; x=1707095457; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5D02TtyuLW5ZKQD1laSUcZWNt/UHpwVXXDyx3e1D3n4=; b=YXQMKVj/9YXhjfwIX0+ijSCn7Gl31vSLCfiemPxKgZt4T/D9UsZx67FRB6BVbk1qcm wIjX0oi6xqu/0j21DTAdsuL1pqRBc4aStMmIVlamezjicqEkUagPDWHQLAqlzYMc1D+0 KyhfVHPvMdIGHejupFnPA/FCO7BtMh/wY9p+BWR54l2fgcTyg8V6D2VSBzoOtqbMlx7P 94pRWqcE/bauNTPpr+wGb444oGxOSCBTqPViSj2lYxDCBkmcz6BdSUBT8MPDmNtwEJQR tTxmPEEYiQE1/XsvPoGxUQTwjV0KOxBBVlw8e4VxVuoFbm9DiH6fNBJDVYl/dsLauUtb LRiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706490657; x=1707095457; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5D02TtyuLW5ZKQD1laSUcZWNt/UHpwVXXDyx3e1D3n4=; b=PZ3MlQYNQC0ZbcLRfTDZSMQeS82GfAxUAbXzerA8EshkhyXVD0D7NLwGNwnTydpKsS MJa40O02IH66wkcVN1G7AbxY+SqhGyGHRUE9+yyvw35T0LSasBd8i60KGLaME7X+sFOS /nvWob/n5DjRuNvfGXax8Gt2ruaqAyq2NPeGiHR0VWsXIN6wPo0NgB8sSvtRhdsxKzig WoJpOHZspK0oo0es+6FvsGcWej1x/EiFS0sskL2im2zoCkrNWJes5Lwb7XeLSQYUokRw JnafQJiyEC1lRnS5tX1x+ttsKgGbHvEu/WpHvefnIX1ZWTL/U7DJHKCCBBQ90seqkvQx aLRA== X-Gm-Message-State: AOJu0Yykl6DKj7D9Fgit5HRtLtrfLMnkuoBiY4u6IhbCMa3N5++X3V9w cL4KmOkXNJGclJN78+XksVRiws8qQevbyTNb1Sa/TJIEaEMr8SYxRIIlL8FVoXNGZLc8zDmDcaU IiMTeVKK++k/KOIJ9tbfFlXsrQ0WLw2nFgPlIrQ== X-Google-Smtp-Source: AGHT+IH5cRe/gjlTJynX2PH4Odq6q+sRtnNs/uDxLHJp81pSe5NJlaMxEDQG6yTCqJr/s9DKp5jm/LQyQiDSyJMVXXc= X-Received: by 2002:aa7:d4c1:0:b0:55c:c474:f89b with SMTP id t1-20020aa7d4c1000000b0055cc474f89bmr2776074edr.38.1706490656492; Sun, 28 Jan 2024 17:10:56 -0800 (PST) List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org MIME-Version: 1.0 References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> In-Reply-To: From: Warner Losh Date: Sun, 28 Jan 2024 18:10:45 -0700 Message-ID: Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) To: Nathan Reilly-list Cc: Guido Falsi , emulation@freebsd.org, "freebsd-arm@freebsd.org" , freebsd-pkg@freebsd.org Content-Type: multipart/alternative; boundary="00000000000064a28406100b5186" X-Rspamd-Queue-Id: 4TNVc36sgMz4syf X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] --00000000000064a28406100b5186 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Jan 28, 2024 at 4:45=E2=80=AFPM Nathan Reilly-list wrote: > > > On 29 Jan 2024, at 8:43=E2=80=AFam, Guido Falsi wrote: > On 28/01/24 22:34, Guido Falsi wrote: > > On 28/01/24 22:23, Warner Losh wrote: > > On Sun, Jan 28, 2024, 12:38=E2=80=AFPM Guido Falsi mad@madpilot.net>> wrote: > > On 28/01/24 15:15, Guido Falsi wrote: > [snip] > > Creating repository in /tmp/packages: 0% > > > > BTW, forgot to mention last time this worked without issue was around > 20th December. > > > I think this is a bsd-user issue. There is a race somewhere in that code > that causes the hangs. I'd love a reproducible test case that is somewhat > smaller than python... there are bigger races with the newer stuff and I'= ve > not had the time to chase it there either. =F0=9F=98=9E > > First of all thanks for your feedback. It encourages me having someone > else with better knowledge about this confirm that a race condition is > actually a possible cause! > Strange this has not been happening up to mid December. > My main and fully reproducible use case is actually mostly with pkg. > at the end of the run poudriere runs `pkg repo` to create the meta files > and sign the repo. It forks itself (ncpus + 2 I guess, even forcing it to= 1 > worker I see three processes), and then locks up, with all the processes > stopping using CPU (ps output is in my message) > I guess this can be reproduced with any poudriere repo with at least more > than ncpus packages in it. can also be reproduced using `poudriere pkgcle= an > -u ` > If that does not work I'm not sure how to reproduce it in other ways, but > I can try writing some code mocking what pkg seems to be doing, not an > expert at such things, though. > > > In case it helps further norrow doen things, It looks like the lockup is > happening somewhere around here: > > > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e= e82/libpkg/pkg_repo_create.c#L778 > > and/or in the pkg_create_repo_worker() function here: > > > https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860e= e82/libpkg/pkg_repo_create.c#L341 > > > (I'm trying to spare you the time needed to find the actual code being > executed, I guess you would have identified this in a few minutes yoursel= f, > but I'm trying to make myself useful) > > > > There appears to be a GitHub issue for poudriere with this, but seems to > be looking in another direction. > > https://github.com/freebsd/poudriere/issues/1009 > There's a FreeBSD bug saying this is happening w/o qemu in the loop. https://bugs.freebsd.org/276690 at least I think that's similar. Warner --00000000000064a28406100b5186 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Sun, Jan 28, 2024 at 4:45=E2=80=AF= PM Nathan Reilly-list <lists@nreill= y.com> wrote:


On 29 Jan 2024, at 8:43=E2=80=AFam, Gu= ido Falsi <mad@mad= pilot.net> wrote:
On 28/01/24 22:34, Guido Falsi wrot= e:
On 28/01/24 22:23, Warner Losh wrote:On Sun, Jan 28, 2024, 12:38=E2=80=AFPM Guido Falsi <= mad@madpilot.net = <mailto:mad@madpil= ot.net>> wrote:

=C2=A0=C2=A0=C2=A0 On 28/01/24 15:15, Guid= o Falsi wrote:
=C2=A0 =C2=A0 [snip]
=C2=A0=C2=A0=C2=A0=C2=A0 > Cre= ating repository in /tmp/packages:=C2=A0=C2=A0 0%
=C2=A0=C2=A0=C2=A0=C2= =A0 >

=C2=A0=C2=A0=C2=A0 BTW, forgot to mention last time this wo= rked without issue was around
=C2=A0=C2=A0=C2=A0 20th December.

<= br>I think this is a bsd-user issue. There is a race somewhere in that code= that causes the hangs. I'd love a reproducible test case that is somew= hat smaller than python... there are bigger races with the newer stuff and = I've not had the time to chase it there either. =F0=9F=98=9E
First of all thanks for your feedback. It encourages me having someon= e else with better knowledge about this confirm that a race condition is ac= tually a possible cause!
Strange this has not been happening up to mid D= ecember.
My main and fully reproducible use case is actually mostly with= pkg.
at the end of the run poudriere runs `pkg repo` to create the meta= files and sign the repo. It forks itself (ncpus + 2 I guess, even forcing = it to 1 worker I see three processes), and then locks up, with all the proc= esses stopping using CPU (ps output is in my message)
I guess this can b= e reproduced with any poudriere repo with at least more than ncpus packages= in it. can also be reproduced using `poudriere pkgclean -u <etc>`If that does not work I'm not sure how to reproduce it in other ways, = but I can try=C2=A0 writing some code mocking what pkg seems to be doing, n= ot an expert at such things, though.

In case it helps f= urther norrow doen things, It looks like the lockup is happening somewhere = around here:

https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47= a860ee82/libpkg/pkg_repo_create.c#L778

and/or in the pkg_create_= repo_worker() function here:

https://github.com/freebsd/pkg/blob/56fa3f87d9d96443= 48b89680dfd8af47a860ee82/libpkg/pkg_repo_create.c#L341


(I= 9;m trying to spare you the time needed to find the actual code being execu= ted, I guess you would have identified this in a few minutes yourself, but = I'm trying to make myself useful)

=

There appears to be a GitHub issue for poudriere= =C2=A0with this, but seems to be looking in another direction.

There's a FreeBSD bug saying this is = happening w/o qemu in the loop. https://bugs.freebsd.org/276690 at least I think that's similar.

Warner
--00000000000064a28406100b5186--