From nobody Sun Jan 28 23:45:19 2024 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TNSjX6v8Rz59XV1 for ; Sun, 28 Jan 2024 23:45:36 +0000 (UTC) (envelope-from lists@nreilly.com) Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TNSjX1p0Wz4hSr for ; Sun, 28 Jan 2024 23:45:36 +0000 (UTC) (envelope-from lists@nreilly.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-6ddd19552e6so802271b3a.1 for ; Sun, 28 Jan 2024 15:45:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nreilly.com; s=gm; t=1706485534; x=1707090334; darn=freebsd.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=wGyEacY0kp/4W4P5VCr1AA/EDxQ7X63P4CPI+X7+fnI=; b=YQWF2yoUrPNVrCthQYuGP9J5bZnpJdeZBcgcKY2073CwGKT985H7QQ3t7HFewZwz2F hIJ0KPWq1eYj8srHyfzNZ9i9dDHenoCuUHVK+lOGm+Z2euJWIj4mQYUwYK65YdvdeJQw H3Pgdq0VhPtoT+DKZ3eDyEPsSBorRME9FKCRw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706485534; x=1707090334; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wGyEacY0kp/4W4P5VCr1AA/EDxQ7X63P4CPI+X7+fnI=; b=HWFPqbcwnFFQbJwrIuXezk7keTO7iYa09U/MPqdeNiROJe8lGYRv5ovtV+ADF3ssdJ KqqkLbgErRIfFtgS/XEmO1F9OVBU1SFr+mvdAN4q91VqTOGUcfcycTonDh/e36ur738q pJt/CTkN2cbcuRB48u6QJe5RF6UlZRc0kxBqtIistLVd09g26lBNFNsNL+fRU7qugj2j 5JHhuY8o+ITtT4Uape7VccxE9OMt716MyPwtvBzQqH+RwKsuJnmQIkCDkEvpx/+4MA7r TeUxIM53o8mrqXut25/bmjOv2Hsfu4k5lFt3L/zYS4Rbvpz3ak0Cgtbir9uSB6ikIhow RoMg== X-Gm-Message-State: AOJu0YxsWDsLFtYc0lz6X+QBU9TaX3Pwt8YgqH/tkNL4kb2i+pB5mwsx JLc/uw8sdtJGQp+mGuLHTiCUpP4a9TJIVzyGIth7cAD0EtIdb2s4nXiKzQRgIQ== X-Google-Smtp-Source: AGHT+IEWMIsKZwzETttr2PNJZERmGFSIRlkM987EN52WLmtgLin/xQE+WUWF2JS06ZgNwjx3Bgc+9A== X-Received: by 2002:a17:903:22c8:b0:1d7:6d49:c78a with SMTP id y8-20020a17090322c800b001d76d49c78amr2792452plg.58.1706485534290; Sun, 28 Jan 2024 15:45:34 -0800 (PST) Received: from smtpclient.apple ([203.123.124.90]) by smtp.gmail.com with ESMTPSA id r5-20020a170902be0500b001d083fed5f3sm4089245pls.60.2024.01.28.15.45.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Jan 2024 15:45:33 -0800 (PST) From: Nathan Reilly-list Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_999BD83C-D93A-42A4-BA24-5CB3F450FB05" List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.400.31\)) Subject: Re: qemu-user-static aarch64 lockup/race? (was Re: Python failure in poudriere on arm64 (via qemu-user-static cross compiling)) Date: Mon, 29 Jan 2024 10:45:19 +1100 In-Reply-To: <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> Cc: emulation@freebsd.org, "freebsd-arm@freebsd.org" , freebsd-pkg@freebsd.org To: Guido Falsi References: <6a33726b-eb6f-418e-9fbd-6d0b9b4bfaa8@madpilot.net> <0fc7f929-6e5b-4a33-97d2-8a9c0c07d524@madpilot.net> <79a5eb0f-d04e-4c1a-9d8a-185e1fb4e4a2@madpilot.net> <5ef2ab66-25ef-45f1-aa5a-4b614eab2f40@madpilot.net> X-Mailer: Apple Mail (2.3774.400.31) X-Rspamd-Queue-Id: 4TNSjX1p0Wz4hSr X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] --Apple-Mail=_999BD83C-D93A-42A4-BA24-5CB3F450FB05 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 29 Jan 2024, at 8:43=E2=80=AFam, Guido Falsi = wrote: > On 28/01/24 22:34, Guido Falsi wrote: >> On 28/01/24 22:23, Warner Losh wrote: >>>=20 >>> On Sun, Jan 28, 2024, 12:38=E2=80=AFPM Guido Falsi > wrote: >>>=20 >>> On 28/01/24 15:15, Guido Falsi wrote: >>> [snip] >>> > Creating repository in /tmp/packages: 0% >>> > >>>=20 >>> BTW, forgot to mention last time this worked without issue was = around >>> 20th December. >>>=20 >>>=20 >>> I think this is a bsd-user issue. There is a race somewhere in that = code that causes the hangs. I'd love a reproducible test case that is = somewhat smaller than python... there are bigger races with the newer = stuff and I've not had the time to chase it there either. =F0=9F=98=9E >> First of all thanks for your feedback. It encourages me having = someone else with better knowledge about this confirm that a race = condition is actually a possible cause! >> Strange this has not been happening up to mid December. >> My main and fully reproducible use case is actually mostly with pkg. >> at the end of the run poudriere runs `pkg repo` to create the meta = files and sign the repo. It forks itself (ncpus + 2 I guess, even = forcing it to 1 worker I see three processes), and then locks up, with = all the processes stopping using CPU (ps output is in my message) >> I guess this can be reproduced with any poudriere repo with at least = more than ncpus packages in it. can also be reproduced using `poudriere = pkgclean -u ` >> If that does not work I'm not sure how to reproduce it in other ways, = but I can try writing some code mocking what pkg seems to be doing, not = an expert at such things, though. >=20 > In case it helps further norrow doen things, It looks like the lockup = is happening somewhere around here: >=20 > = https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee= 82/libpkg/pkg_repo_create.c#L778 >=20 > and/or in the pkg_create_repo_worker() function here: >=20 > = https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680dfd8af47a860ee= 82/libpkg/pkg_repo_create.c#L341 >=20 >=20 > (I'm trying to spare you the time needed to find the actual code being = executed, I guess you would have identified this in a few minutes = yourself, but I'm trying to make myself useful) There appears to be a GitHub issue for poudriere with this, but seems to = be looking in another direction. https://github.com/freebsd/poudriere/issues/1009 Regards, Nathan= --Apple-Mail=_999BD83C-D93A-42A4-BA24-5CB3F450FB05 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On 29 Jan 2024, at 8:43=E2=80=AFam, Guido Falsi = <mad@madpilot.net> wrote:
On 28/01/24 22:34, Guido = Falsi wrote:
On 28/01/24 22:23, Warner Losh = wrote:
On Sun, Jan 28, 2024, 12:38=E2=80=AFPM = Guido Falsi <mad@madpilot.net <mailto:mad@madpilot.net>> = wrote:

    On 28/01/24 15:15, Guido Falsi = wrote:
    [snip]
     > Creating = repository in /tmp/packages:   0%
     = >

    BTW, forgot to mention last time this = worked without issue was around
    20th = December.


I think this is a bsd-user issue. There is a race = somewhere in that code that causes the hangs. I'd love a reproducible = test case that is somewhat smaller than python... there are bigger races = with the newer stuff and I've not had the time to chase it there either. = =F0=9F=98=9E
First of all thanks for your feedback. It = encourages me having someone else with better knowledge about this = confirm that a race condition is actually a possible cause!
Strange = this has not been happening up to mid December.
My main and fully = reproducible use case is actually mostly with pkg.
at the end of the = run poudriere runs `pkg repo` to create the meta files and sign the = repo. It forks itself (ncpus + 2 I guess, even forcing it to 1 worker I = see three processes), and then locks up, with all the processes stopping = using CPU (ps output is in my message)
I guess this can be reproduced = with any poudriere repo with at least more than ncpus packages in it. = can also be reproduced using `poudriere pkgclean -u <etc>`
If = that does not work I'm not sure how to reproduce it in other ways, but I = can try  writing some code mocking what pkg seems to be doing, not = an expert at such things, though.

In case it helps = further norrow doen things, It looks like the lockup is happening = somewhere around = here:

https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680d= fd8af47a860ee82/libpkg/pkg_repo_create.c#L778

and/or in the = pkg_create_repo_worker() function = here:

https://github.com/freebsd/pkg/blob/56fa3f87d9d9644348b89680d= fd8af47a860ee82/libpkg/pkg_repo_create.c#L341


(I'm trying to = spare you the time needed to find the actual code being executed, I = guess you would have identified this in a few minutes yourself, but I'm = trying to make myself = useful)


Th= ere appears to be a GitHub issue for poudriere with this, but seems = to be looking in another direction.


Regards,
Nathan
= --Apple-Mail=_999BD83C-D93A-42A4-BA24-5CB3F450FB05--