Re: 4-core arm armv7-package-building configuration notes, on RPi4B (aarch64) and OrangePi+2ed (armv7), poudriere-devel based

From: Oliver Epper <oliver.epper_at_gmail.com>
Date: Fri, 15 Mar 2024 08:59:45 UTC
I can only suggest deploying a Mac Mini M1 with qemu using Apple HVF as a
poudriere server for aarch64.
See https://oliver-epper.de/posts/poudriere-on-m1-mac/

I never had success with qemu emulation on amd64 and the Mac Mini builds
rust and llvm without problems. I don't think I'd like to wait for these
packages to build on a pi.

greetings
Oliver

Am Fr., 15. März 2024 um 09:39 Uhr schrieb Mark Millard <marklmi@yahoo.com>:

> On Mar 12, 2024, at 23:57, Mark Millard <marklmi@yahoo.com> wrote:
>
> > This note's structure:
> >
> > 1st: Package-build time frame summaries.
> >     (But I note some hardware points that are repeated later as well.)
> >
> > 2nd: Configuration points common to both RPi4B and OrangePi+2ed contexts.
> >
> > 3rd: Configuration points unique to the RPi4B context.
> >
> > 4th: Configuration points unique to the OrangePi+2ed context.
> >
> >
> > 1st: Package-build time Summaries follow.
> > (Note: the detail order of package builds is not the same.)
> > (Examples are visiable in these summaries.)
> >
> >
> > RPi4B: cortex-a72 (aarch64) with cortex-a7 (armv7) support, 2 GHz
> (overclocked), 8 GiBytes RAM, USB3
> > [00:25:32] [01] [00:13:33] Finished lang/perl5.36 | perl5-5.36.3_1:
> Success
> > [01:58:13] [02] [00:44:25] Finished devel/icu | icu-74.2,1: Success
> > [03:14:00] [02] [00:21:28] Finished lang/ruby31 | ruby-3.1.4_1,1: Success
> > [03:33:51] [01] [02:21:22] Finished devel/cmake-core |
> cmake-core-3.28.3: Success
> > [23:12:47] [02] [19:06:01] Finished lang/rust | rust-1.76.0: Success
> > [1D:00:14:46] [02] [00:55:46] Finished devel/binutils@native |
> binutils-2.40_5,1: Success
> > (Note: start of visible ordering differences:)
> > [1D:03:07:32] [02] [00:58:03] Finished devel/arm-none-eabi-gcc |
> arm-none-eabi-gcc-11.3.0_3: Success
> > [1D:03:42:09] [01] [1D:00:08:13] Finished devel/llvm18@default |
> llvm18-18.1.0.r3: Success
> > [1D:04:45:14] [02] [01:35:29] Finished lang/gcc13 | gcc13-13.2.0_4:
> Success
> > [1D:05:21:43] [01] [01:39:13] Finished devel/boost-libs |
> boost-libs-1.84.0: Success
> > [1D:05:43:24] [01] [00:21:33] Finished textproc/source-highlight |
> source-highlight-3.1.9_9: Success
> > [1D:05:47:01] [02] [00:44:22] Finished devel/aarch64-none-elf-gcc |
> aarch64-none-elf-gcc-11.3.0_3: Success
> > [1D:07:23:25] [02] [01:21:04] Finished devel/gdb@py39 | gdb-14.1_2:
> Success
> > [1D:07:58:37] [01] [01:19:55] Finished devel/freebsd-gcc13@armv7 |
> armv7-gcc13-13.2.0_1: Success
> > [1D:07:58:43] Stopping 2 builders
> > [main-CA7-default] [2024-03-11_15h30m14s] [committing] Queued: 265
> Built: 265 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 0
> Time: 1D:07:58:46
> >
> > Note: 4364Mi MaxObs(Act+Wir+Lndry+SwapUsed) ("MaxObs":  short for
> "Maximum Observed")
> > Note: SwapUsed maximum: 0 (none used).
> >
> > So, for an 8 GiByte RAM RPI4B, RAM+SWAP configured to be 38 GiBytes or
> so:
> > Estmate: 38.0 GiBytes/4.3 GiBytes approx.== 8.8
> > Result: Lots of margin for builds that use more RAM+SWAP.
> >
> > So, for an 4 GiByte RAM RPI4B, RAM+SWAP configured to be 18 GiBytes or
> so:
> > Estimate: 18.0 GiBytes/4.3 GiBytes approx.== 4.1
> > Result: Also lots of margin for builds that use more RAM+SWAP.
>
> I did the experiment of trying PARALLEL_JOBS=3 instead of
> PARALLEL_JOBS=2 , still MAKE_JOBS_NUMBER_LIMIT=3 . It took
> a little longer:
>
> [1D:08:52:32] Stopping 3 builders
> [main-CA7-default] [2024-03-13_16h27m18s] [committing] Queued: 265 Built:
> 265 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 0    Time:
> 1D:08:52:35
>
> (As the load averages are significantly more than the
> available hardware thread count and vary significantly,
> comparing individual package build times is not
> particularly useful. So I'm not reporting any example
> times for packages.)
>
> At some point I'll likely try the PARALLEL_JOBS=2
> MAKE_JOBS_NUMBER_LIMIT=3 combination on a RPI4B that has
> 4 GiBytes of RAM. I really  need the memory pressure
> involved in significant paging to get reasonable estimates
> for RAM+SWAP requiments, avoiding just ending up with a
> large Inact accumulation with an unknown mix of dirty pages
> and clean pages.
>
> But, I'll likely try the RPi5 (8 GiBytes) first, now
> that the RPi5 EDK2 has fixed what the problem was that lead
> to unreliable USB I/O for UEFI/ACPI. (I'll likely use an
> artifact build since the release build looks to not be
> present yet.)
>
> > OrangePi+2ed: cortex-a7 armv7, 1GHz, 4 cores, 2 GiBytes RAM, USB2:
> > [01:51:31] [01] [01:00:07] Finished lang/perl5.36 | perl5-5.36.3_1:
> Success
> > [08:55:35] [02] [03:08:09] Finished devel/icu | icu-74.2,1: Success
> > [13:17:38] [02] [01:28:32] Finished lang/ruby31 | ruby-3.1.4_1,1: Success
> > [14:17:44] [01] [09:20:55] Finished devel/cmake-core |
> cmake-core-3.28.3: Success
> > [4D:01:03:43] [02] [3D:08:48:53] Finished lang/rust | rust-1.76.0:
> Success
> > [4D:06:26:24] [02] [03:09:35] Finished devel/binutils@native |
> binutils-2.40_5,1: Success
> > (Note: start of visible ordering differences:)
> > [4D:14:54:31] [02] [03:38:55] Finished devel/aarch64-none-elf-gcc |
> aarch64-none-elf-gcc-11.3.0_3: Success
> > [4D:16:13:00] [01] [4D:01:55:03] Finished devel/llvm18@default |
> llvm18-18.1.0.r3: Success
> > [4D:18:05:58] [02] [03:11:00] Finished devel/arm-none-eabi-gcc |
> arm-none-eabi-gcc-11.3.0_3: Success
> > [4D:23:00:13] [01] [06:46:06] Finished devel/boost-libs |
> boost-libs-1.84.0: Success
> > [5D:00:16:39] [01] [01:15:53] Finished textproc/source-highlight |
> source-highlight-3.1.9_9: Success
> > [5D:01:17:24] [02] [07:10:52] Finished lang/gcc13 | gcc13-13.2.0_4:
> Success
> > [5D:09:38:14] [01] [05:56:48] Finished devel/freebsd-gcc13@armv7 |
> armv7-gcc13-13.2.0_1: Success
> > [5D:10:18:58] [02] [05:44:02] Finished devel/gdb@py39 | gdb-14.1_2:
> Success
> > [5D:10:31:56] Stopping 2 builders
> > [main-CA7-default] [2024-03-06_03h15m10s] [committing] Queued: 265
> Built: 265 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 0
> Time: 5D:10:31:55
> >
> > (So, a little over 4 days longer than the RPi4B example above.)
> >
> > Note: 2794Mi MaxObs(Act+Wir+Lndry+SwapUsed) ("MaxObs":  short for
> "Maximum Observed")
> >
> >
> > 2nd: Configuration points common to both the RPi4B and the
> >     OrangePi+2ed contexts.
> >
> > ports-mgmt/poudriere-devel is used to build the packages.
> >
> > devel/llvm18 options: using BE_NATIVE and omitting MLIR.
> > (What I normally build for armv7 and aarch64 targetting.)
> >
> > Also, ports-mgmt/poudriere-devel omits the QEMU option,
> > as is normal for me.
> >
> > 265 packages are built, including pkg. It is the same
> > 265 pacakges across contexts. (The order of the builds
> > does vary.)
> >
> > /usr/local/etc/poudriere.conf has . . .
> >
> > NO_ZFS=yes
> > PARALLEL_JOBS=2
> > ALLOW_MAKE_JOBS=yes
> > MAX_EXECUTION_TIME=432000
> > NOHANG_TIME=432000
> > MAX_EXECUTION_TIME_EXTRACT=14400
> > MAX_EXECUTION_TIME_INSTALL=14400
> > MAX_EXECUTION_TIME_PACKAGE=57600
> > MAX_EXECUTION_TIME_DEINSTALL=14400
> >
> > NOTE: MAKE_JOBS_NUMBER_LIMIT is used to constrain
> >      what ALLOW_MAKE_JOBS does but is not set the
> >      same across the contexts.
> >
> > /etc/fstab does not specify any tmpfs use or the
> > like: avoids competing for RAM+SWAP.
> >
> > poudriere armv7 jail worlds are duplicates of each
> > other across the different media. Those worlds are
> > from a personal buildworld based on using
> > -mcpu=cortex-a7 for the code generation. The package
> > builds also use that.
> >
> > /boot/loader.conf has . . .
> >
> > # Delay when persistent low free RAM leads to
> > # Out Of Memory killing of processes:
> > vm.pageout_oom_seq=120
> >
> > Heatsinks and fans for keeping things cool over the
> > sustained build activity.
> >
> >
> > 3rd: Configuration points unique to the RPi4B context.
> >
> > /usr/local/etc/poudriere.conf has . . .
> >
> > USE_TMPFS="data"
> >
> > (Based on the larger RAM and RAM+SWAP and that it
> > does not grow to be huge for the likes of lang/rust .)
> >
> > /usr/local/etc/poudriere.d/make.conf has . . .
> >
> > MAKE_JOBS_NUMBER_LIMIT=3
> >
> > (Based on the larger RAM and RAM+SWAP.) This does mean
> > that the 3 load averages can be 6+ at times on the 4
> > hardware thread system while both ports being built are
> > respecting the limit. Some ports do not fully respect
> > the limit the whole time. This can make build-times
> > a somewhat messier comparison than one might hope across
> > the contexts. But for the specifics here, things should
> > be clear enough.
> >
> > RAM == 8 GiBytes
> > RAM+SWAP == 38 GiBytes
> > (Note aarch64 allows a larger RAM multiplier limit without
> > warning of potential swap-related mistuning: "total
> > configured swap (? pages) exceeds maximum recommended
> > amount (? pages)" with "increase kern.maxswzone or reduce
> > amount of swap".)
> >
> > 5.1V 3.5A power supply, so a little extra margin for current.
> >
> > /boot/efi/config.txt has:
> >
> > over_voltage=6
> > arm_freq=2000
> > sdram_freq_min=3200
> > force_turbo=1
> > (Reliable operation, with margin, on the mix of v1.1, v1.4, and v1.5
> > RPi4B's that I have access to, 8 total.)
> >
> > So: 2 GHz overclocking, using a fixed rate.
> >
> > USB3 media: U2 Optane 960 GB media via a powered USB3 adaptor.
> >
> > Kernel has: "arm64: improve UVA layout for 32bit processes"
> > ( main's 967022aa5aa6 ). So an armv7 process can be somewhat
> > over 3 GiBytes for its address space.
> >
> > Boot aarch64 env: a PkgBase world and kernel.GENERIC-NODEBUG pair.
> > FYI:
> >
> > # uname -apKU
> > FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT
> main-n268514-61b88a230bac GENERIC-NODEBUG arm64 aarch64 1500014 1500014
> >
> >
> > 4th: Configuration points unique to the OrangePi+2ed context.
> >
> > /usr/local/etc/poudriere.conf has . . .
> >
> > USE_TMPFS=no
> >
> > (Based on the smaller RAM --and smaller RAM+SWAP for avoiding
> > potential-mistuning notices.)
> >
> > /usr/local/etc/poudriere.d/make.conf has . . .
> >
> > MAKE_JOBS_NUMBER_LIMIT=2
> >
> > (Based on the smaller RAM --and smaller RAM+SWAP for avoiding
> > potential-mistuning notices-- but wanting to still have margin
> > for bigger peak RAM+SWAP use than the example happens to do.)
> >
> > RAM == 2 GiBytes
> > RAM+SWAP == 5.6 GiBytes
> > (Note armv7 has a smaller RAM multiplier limit without
> > warning of potential swap-related mistuning: "total
> > configured swap (? pages) exceeds maximum recommended
> > amount (? pages)" with "increase kern.maxswzone or reduce
> > amount of swap".)
> >
> > In /etc/rc.conf I have:
> >
> > if [ "`sysctl -i -n hw.fdt.model`" == "Xunlong Orange Pi Plus 2E" ]; then
> > sysctl dev.cpu.0.freq=1008 > /dev/null
> > fi
> >
> > In other words: a fixed 1GHz or so clock rate is used.
> >
> > USB2 media: Actually USB3 media that also supports USB2
> > use. 1 TB Samsung Touch T7 (NVMe based) via a powered hub,
> > a USB3-capable one.
> >
> >
> >
> > Side note:
> >
> > I've no clue how to judge any tradeoff consequences for
> > "increase kern.maxswzone" for judging reasonableness of
> > such an action.
> >
>
>
>
> ===
> Mark Millard
> marklmi at yahoo.com
>
>
>