Re: A better alternative to having builds of main-armv7-default fully disabled and last-built be months out of date

From: Mark Millard <marklmi_at_yahoo.com>
Date: Mon, 15 Jul 2024 05:02:03 UTC
On Jul 14, 2024, at 17:20, Philip Paeps <philip@freebsd.org> wrote:

> Sorry for not following up to this thread earlier.
> I've been occupied elsewhere in the cluster.
> 
> On 2024-07-07 16:25:32 (+0800), Mark Millard wrote:
>> On Jul 6, 2024, at 21:35, Michal Meloun <meloun.michal@gmail.com> wrote:
>>> On 07.07.2024 5:42, Mark Millard wrote:
>>>> main's armv7 packages that are distributed are getting to be months
>>>> behind because of the build hangups preventing the builds on ampere2.
> 
> It's worth reinforcing that this only affects main (15-CURRENT).  Our stable/13 and stable/14 packages for armv7 are reasonably up to date.  Reasonably for a tier-2 architecture anyway.  Whatever is causing this, it's only in main.
> 
>> The only known failures are on ampere2 as far as I know.
>> As far as I know there is no known way to configure to
>> match the formal build procedures used on ampere2.
> 
> According to the current schedule, armv7 builds happen on ampere3, not ampere2:
> 
> ampere1: - quarterly arm64.aarch64 13.3-RELEASE 133arm64 -a
> ampere1: - quarterly arm.armv7 releng/13.3 133releng-armv7 -a
> ampere1: - quarterly arm64.aarch64 14.0-RELEASE 140arm64 -a
> ampere1: - quarterly arm.armv7 releng/14.0 140releng-armv7 -a
> ampere2: - default arm64.aarch64 main main-arm64 -a
> ampere3: - default arm64.aarch64 13.3-RELEASE 133arm64 -a
> ampere3: - default arm.armv7 releng/13.3 133releng-armv7 -a
> ampere3: - default arm64.aarch64 14.0-RELEASE 140arm64 -a
> ampere3: - default arm.armv7 releng/14.0 140releng-armv7 -a

Putting the ones that mention armv7 together, with the others
omited:

ampere1: - quarterly arm.armv7 releng/13.3 133releng-armv7 -a
ampere1: - quarterly arm.armv7 releng/14.0 140releng-armv7 -a
ampere3: - default arm.armv7 releng/13.3 133releng-armv7 -a
ampere3: - default arm.armv7 releng/14.0 140releng-armv7 -a

None of those are for main [so: 15]. They are all for working
contexts.

main-armv7-default last ran on ampere2 2024-05-31/2024-06-01
or so. I'm not aware of any main-armv7-default builds done
via ampere1 or ampere3.

> I've attached the poudriere.conf from that machine.  It's the same one we have on all the builders.

Is the poudriere.conf content the same as for the main [so: 15]
context (ampere2) and the ampere3 context(s)?

>>> I've seen some strange live lockups in arm32 jail, but never managed to reproduce it.
>> 
>> On what kind(s) of hardware?
>> Any kind of relevant context known?
> 
> In case it helps: ref15-aarch64.freebsd.org (available to all developers) is an identical configuration as ampereX.nyi.freebsd.org.  The former has a newer BIOS (for some reason) but that hopefully should not make a difference.  If we reach the point where we think the BIOS version matters, I can try to upgrade the BIOS on the ampereXen.
> 
> smbios.bios.reldate="06/25/2020"
> smbios.bios.revision="1.14"
> smbios.bios.vendor="LENOVO"
> smbios.bios.version="hve104q-1.14"
> 
> smbios.bios.reldate="05/30/2019"
> smbios.bios.revision="1.8"
> smbios.bios.vendor="LENOVO"
> smbios.bios.version="HVE104J-1.08"

Looking at the poudriere.conf example, it points out
another difference for my more recent testing:
strictly UFS contexts for my aarch64 and armv7 systems
these days. The only media that is ZFS based these
days for any system in my active use is for the 7950X3D
(amd64).

My switching to UFS matches up with my switching to use
pkgbase to install and test official FreeBSD builds
(all of: kernel, world, ports) for comparison/contrast
with my personal builds of such (that involves some
locally patched files).


We do know when the last successful from-scratch
"bulk -a" involving graphics/graphviz
was before the armv7 problems started: the build of
pkg started on Feb 19:

pe9c9c73181b5_sbd45bbe440

=>> Building ports-mgmt/pkg
build started at Mon Feb 19 12:47:46 UTC 2024
port directory: /usr/ports/ports-mgmt/pkg
package name: pkg-1.20.9_1
building for: FreeBSD main-armv7-default-job-01 15.0-CURRENT FreeBSD 15.0-CURRENT 1500014 arm
maintained by: pkg@FreeBSD.org
Makefile datestamp: -rw-r--r-- 1 root wheel 2311 Feb 1 01:02 /usr/ports/ports-mgmt/pkg/Makefile
Ports top last git commit: e9c9c73181b
Ports top unclean checkout: no
Port dir last git commit: f7f4c1a0472
Port dir unclean checkout: no
Poudriere version: poudriere-git-3.4.1
Host OSVERSION: 1500006
Jail OSVERSION: 1500014
Job Id: 01

We also know the first observed failure with the
symptoms (not a from-scratch build), where it
started with a dns/public_suffix_list build that
was dated Feb 28:

p43e3af5f5763_sf5f08e41aa

=>> Building dns/public_suffix_list
build started at Wed Feb 28 16:05:30 UTC 2024
port directory: /usr/ports/dns/public_suffix_list
package name: public_suffix_list-20240130
building for: FreeBSD main-armv7-default-job-07 15.0-CURRENT FreeBSD 15.0-CURRENT 1500014 arm
maintained by: sunpoet@FreeBSD.org
Makefile datestamp: -rw-r--r-- 1 root wheel 770 Feb 25 01:02 /usr/ports/dns/public_suffix_list/Makefile
Ports top last git commit: 43e3af5f576
Ports top unclean checkout: no
Port dir last git commit: 906be52cfb7
Port dir unclean checkout: no
Poudriere version: poudriere-git-3.4.1-1-g1e9f97d6
Host OSVERSION: 1500006
Jail OSVERSION: 1500014
Job Id: 07

Bisection of the kernel/world combinations between
would be very disruptive to other uses of the
machine doing the bisections. But such would be one
way of trying to narrow down what change(s) lead to
the problem showing up for main [so: 15].

So for FreeBSD kernel/world main that would be over:

       • git: bd45bbe440f1 - main - rescue: Fix after zfsbootcfg addition Warner Losh 
Tue, 13 Feb 2024
. . .
Sun, 25 Feb 2024
. . .
   • git: f5f08e41aa57 - main - loader/efi: Only include interpreter's linker script Warner Losh

Looks like that is something like around 120 commits
to main [so: 15].

But for _sbd45bbe440 and _sf5f08e41aa I'm not so
sure that the kernel booted matches the system
commits referenced. If not, the specific kernel
build does not seem to be identified in anything
that I have access to. Nothing like the output
from the likes of:

# uname -v
FreeBSD 15.0-CURRENT main-n270963-609cdb12b962 GENERIC

is in the build log output (presumes a context
with UNAME_v not overriding what would be
shown for the specific output).

For main, freebsd-version output is not appropriately
detailed of an identification for the purpose.


===
Mark Millard
marklmi at yahoo.com