Re: 4-core arm armv7-package-building configuration notes, on [RPi5 &] RPi4B (aarch64) and OrangePi+2ed (armv7), poudriere-devel based
Date: Fri, 22 Mar 2024 16:28:19 UTC
[Adding another type of RPi5 testing for comparison/contrast: some use of USE_TMPFS=all. Added at the end, after the quoting of the prior report.] On Mar 15, 2024, at 15:32, Mark Millard <marklmi@yahoo.com> wrote: > [Update to add RPi5 example. The RPi5 is cortex-a76 based > instead of being cortex-a72 based (RPi4B).] > > On Mar 12, 2024, at 23:57, Mark Millard <marklmi@yahoo.com> wrote: > >> This note's structure: >> >> 1st: Package-build time frame summaries. >> (But I note some hardware points that are repeated later as well.) >> >> 2nd: Configuration points common to both RPi4B and OrangePi+2ed contexts. > > New: > 3rd.RPi5: Configuration points unique to the RPi5B context. > >> 3rd: Configuration points unique to the RPi4B context. > > Rename the above: > 3rd.RPi4B: Configuration points unique to the RPi4B context. > >> 4th: Configuration points unique to the OrangePi+2ed context. >> >> >> 1st: Package-build time Summaries follow. >> (Note: the detail order of package builds is not the same.) >> (Examples are visiable in these summaries.) > > Shortest summary: > RPi5: 12:30:37 for the 265 armv7 packages to build from scratch > RPi4B: 1D:07:58:46 for the 265 armv7 packages to build from scratch > OrangePi+2ed: 5D:10:31:55 for the 265 armv7 packages to build from scratch > > Showing packages that took over 1hr to build on the > OrangePi+2ed . . . > > RPi5: cortex-a76 (aarch64) with cortex-a7 (armv7) support, 2.4 GHz, 8GiBytes RAM, USB3: > (personal -mcpu=cortex-c76 boot-kernel but a PkgBase aarch64 boot-world) > PARALLEL_JOBS=2 MAKE_JOBS_NUMBER_LIMIT=3 > [00:11:50] [01] [00:05:45] Finished lang/perl5.36 | perl5-5.36.3_1: Success > [00:59:31] [02] [00:16:58] Finished devel/icu | icu-74.2,1: Success > [01:32:28] [02] [00:07:43] Finished lang/ruby31 | ruby-3.1.4_1,1: Success > [01:37:06] [01] [00:56:07] Finished devel/cmake-core | cmake-core-3.28.3: Success > [09:19:35] [02] [07:22:23] Finished lang/rust | rust-1.76.0: Success > [09:56:15] [02] [00:24:45] Finished devel/binutils@native | binutils-2.40_5,1: Success > (Note: start of visible ordering differences:) > [10:10:53] [01] [08:33:44] Finished devel/llvm18@default | llvm18-18.1.0.r3: Success > [10:45:38] [01] [00:34:32] Finished devel/boost-libs | boost-libs-1.84.0: Success > [10:53:11] [01] [00:07:30] Finished textproc/source-highlight | source-highlight-3.1.9_9: Success > [11:04:11] [02] [00:19:46] Finished devel/arm-none-eabi-gcc | arm-none-eabi-gcc-11.3.0_3: Success > [11:32:41] [02] [00:19:34] Finished devel/aarch64-none-elf-gcc | aarch64-none-elf-gcc-11.3.0_3: Success > [11:34:25] [01] [00:39:38] Finished lang/gcc13 | gcc13-13.2.0_4: Success > [12:10:42] [01] [00:28:22] Finished devel/gdb@py39 | gdb-14.1_2: Success > [12:30:33] [02] [00:29:14] Finished devel/freebsd-gcc13@armv7 | armv7-gcc13-13.2.0_1: Success > [12:30:37] Stopping 2 builders > [main-CA7-default] [2024-03-15_02h05m19s] [committing] Queued: 265 Built: 265 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 12:30:37 > . . . > >> RPi4B: cortex-a72 (aarch64) with cortex-a7 (armv7) support, 2 GHz (overclocked), 8 GiBytes RAM, USB3 >> [00:25:32] [01] [00:13:33] Finished lang/perl5.36 | perl5-5.36.3_1: Success >> [01:58:13] [02] [00:44:25] Finished devel/icu | icu-74.2,1: Success >> [03:14:00] [02] [00:21:28] Finished lang/ruby31 | ruby-3.1.4_1,1: Success >> [03:33:51] [01] [02:21:22] Finished devel/cmake-core | cmake-core-3.28.3: Success >> [23:12:47] [02] [19:06:01] Finished lang/rust | rust-1.76.0: Success >> [1D:00:14:46] [02] [00:55:46] Finished devel/binutils@native | binutils-2.40_5,1: Success >> (Note: start of visible ordering differences:) >> [1D:03:07:32] [02] [00:58:03] Finished devel/arm-none-eabi-gcc | arm-none-eabi-gcc-11.3.0_3: Success >> [1D:03:42:09] [01] [1D:00:08:13] Finished devel/llvm18@default | llvm18-18.1.0.r3: Success >> [1D:04:45:14] [02] [01:35:29] Finished lang/gcc13 | gcc13-13.2.0_4: Success >> [1D:05:21:43] [01] [01:39:13] Finished devel/boost-libs | boost-libs-1.84.0: Success >> [1D:05:43:24] [01] [00:21:33] Finished textproc/source-highlight | source-highlight-3.1.9_9: Success >> [1D:05:47:01] [02] [00:44:22] Finished devel/aarch64-none-elf-gcc | aarch64-none-elf-gcc-11.3.0_3: Success >> [1D:07:23:25] [02] [01:21:04] Finished devel/gdb@py39 | gdb-14.1_2: Success >> [1D:07:58:37] [01] [01:19:55] Finished devel/freebsd-gcc13@armv7 | armv7-gcc13-13.2.0_1: Success >> [1D:07:58:43] Stopping 2 builders >> [main-CA7-default] [2024-03-11_15h30m14s] [committing] Queued: 265 Built: 265 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 1D:07:58:46 >> >> . . . > > [Notes about RAM+SWAP use removed, given the lack of memory > pressure in the 8 GiByte context examples. I may run a > 4 GiByte RPi4B example and replace the 8 GiByte example > with the 4 GiByte that would have some memory pressure.] > >> OrangePi+2ed: cortex-a7 armv7, 1GHz, 4 cores, 2 GiBytes RAM, USB2: >> [01:51:31] [01] [01:00:07] Finished lang/perl5.36 | perl5-5.36.3_1: Success >> [08:55:35] [02] [03:08:09] Finished devel/icu | icu-74.2,1: Success >> [13:17:38] [02] [01:28:32] Finished lang/ruby31 | ruby-3.1.4_1,1: Success >> [14:17:44] [01] [09:20:55] Finished devel/cmake-core | cmake-core-3.28.3: Success >> [4D:01:03:43] [02] [3D:08:48:53] Finished lang/rust | rust-1.76.0: Success >> [4D:06:26:24] [02] [03:09:35] Finished devel/binutils@native | binutils-2.40_5,1: Success >> (Note: start of visible ordering differences:) >> [4D:14:54:31] [02] [03:38:55] Finished devel/aarch64-none-elf-gcc | aarch64-none-elf-gcc-11.3.0_3: Success >> [4D:16:13:00] [01] [4D:01:55:03] Finished devel/llvm18@default | llvm18-18.1.0.r3: Success >> [4D:18:05:58] [02] [03:11:00] Finished devel/arm-none-eabi-gcc | arm-none-eabi-gcc-11.3.0_3: Success >> [4D:23:00:13] [01] [06:46:06] Finished devel/boost-libs | boost-libs-1.84.0: Success >> [5D:00:16:39] [01] [01:15:53] Finished textproc/source-highlight | source-highlight-3.1.9_9: Success >> [5D:01:17:24] [02] [07:10:52] Finished lang/gcc13 | gcc13-13.2.0_4: Success >> [5D:09:38:14] [01] [05:56:48] Finished devel/freebsd-gcc13@armv7 | armv7-gcc13-13.2.0_1: Success >> [5D:10:18:58] [02] [05:44:02] Finished devel/gdb@py39 | gdb-14.1_2: Success >> [5D:10:31:56] Stopping 2 builders >> [main-CA7-default] [2024-03-06_03h15m10s] [committing] Queued: 265 Built: 265 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 5D:10:31:55 >> >> (So, a little over 4 days longer than the RPi4B example above.) >> >> Note: 2794Mi MaxObs(Act+Wir+Lndry+SwapUsed) ("MaxObs": short for "Maximum Observed") >> >> >> 2nd: Configuration points common to both the RPi4B and the >> OrangePi+2ed contexts. > > And common to the RPi5 context as well. > >> ports-mgmt/poudriere-devel is used to build the packages. >> >> devel/llvm18 options: using BE_NATIVE and omitting MLIR. >> (What I normally build for armv7 and aarch64 targetting.) >> >> Also, ports-mgmt/poudriere-devel omits the QEMU option, >> as is normal for me. >> >> 265 packages are built, including pkg. It is the same >> 265 pacakges across contexts. (The order of the builds >> does vary.) >> >> /usr/local/etc/poudriere.conf has . . . >> >> NO_ZFS=yes >> PARALLEL_JOBS=2 >> ALLOW_MAKE_JOBS=yes >> MAX_EXECUTION_TIME=432000 >> NOHANG_TIME=432000 >> MAX_EXECUTION_TIME_EXTRACT=14400 >> MAX_EXECUTION_TIME_INSTALL=14400 >> MAX_EXECUTION_TIME_PACKAGE=57600 >> MAX_EXECUTION_TIME_DEINSTALL=14400 >> >> NOTE: MAKE_JOBS_NUMBER_LIMIT is used to constrain >> what ALLOW_MAKE_JOBS does but is not set the >> same across the contexts. > > Only the native armv7 (OrangePi+2ed) context is > not using MAKE_JOBS_NUMBER_LIMIT=3 . It uses 2 > instead. > >> /etc/fstab does not specify any tmpfs use or the >> like: avoids competing for RAM+SWAP. >> >> poudriere armv7 jail worlds are duplicates of each >> other across the different media. Those worlds are >> from a personal buildworld based on using >> -mcpu=cortex-a7 for the code generation. The package >> builds also use that. >> >> /boot/loader.conf has . . . >> >> # Delay when persistent low free RAM leads to >> # Out Of Memory killing of processes: >> vm.pageout_oom_seq=120 >> >> Heatsinks and fans for keeping things cool over the >> sustained build activity. > > > 3rd.RPi5: Configuration points unique to the RPi5 context. > > For the RPi5, I list what is different than the below RPi4B > context. > > The power supply is the official one recommended for the > RPi5. The "Raspberry Pi Active Cooler" is specifically what > is in use as the fan/heatsink. > > EDK2 from https://github.com/worproject/rpi5-uefi is in use > to boot the RPi5 via UEFI/ACPI, the material from after the > old v0.2 release. EDK2 is on a microsd card, separate from > the USB3 boot media. No use of U-Boot. (Ethernet is via a > USB3 dongle.) > > The config.txt is from the EDK2 materials, no overclocking > but EDK2 does have force_turbo=1 . (Possibly some extra RPi* > firmware debug output is enabled or such.) > > The USB3 media is the same as used for the RPi4B but I > edited my /boot/loader.conf to indicate to boot my personal > kernel.CA76-NODBG that is based on -mcpu=cortex-a76 and use > of LSE_ATOMICS . (And has my usual personal-build patching.): > > # uname -apKU > FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT #0 main-n268520-5e248c23d995-dirty: Sun Mar 3 02:32:48 UTC 2024 root@aarch64-main-pkgs:/usr/obj/BUILDs/main-CA76-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA76 arm64 aarch64 1500014 1500014 > > (The boot world used is the same PkgBase world that was used > on the RPi4B.) > > >> . . . > 3rd.RPi4B: Configuration points unique to the RPi4B context. > >> /usr/local/etc/poudriere.conf has . . . >> >> USE_TMPFS="data" >> >> (Based on the larger RAM and RAM+SWAP and that it >> does not grow to be huge for the likes of lang/rust .) >> >> /usr/local/etc/poudriere.d/make.conf has . . . >> >> MAKE_JOBS_NUMBER_LIMIT=3 >> >> (Based on the larger RAM and RAM+SWAP.) This does mean >> that the 3 load averages can be 6+ at times on the 4 >> hardware thread system while both ports being built are >> respecting the limit. Some ports do not fully respect >> the limit the whole time. This can make build-times >> a somewhat messier comparison than one might hope across >> the contexts. But for the specifics here, things should >> be clear enough. >> >> RAM == 8 GiBytes >> RAM+SWAP == 38 GiBytes >> (Note aarch64 allows a larger RAM multiplier limit without >> warning of potential swap-related mistuning: "total >> configured swap (? pages) exceeds maximum recommended >> amount (? pages)" with "increase kern.maxswzone or reduce >> amount of swap".) >> >> 5.1V 3.5A power supply, so a little extra margin for current. >> >> /boot/efi/config.txt has: > > Definitely RPi4B specific: > >> over_voltage=6 >> arm_freq=2000 >> sdram_freq_min=3200 >> force_turbo=1 >> (Reliable operation, with margin, on the mix of v1.1, v1.4, and v1.5 >> RPi4B's that I have access to, 8 total.) >> >> So: 2 GHz overclocking, using a fixed rate. >> >> USB3 media: U2 Optane 960 GB media via a powered USB3 adaptor. > > That media is also used with the RPi5. > >> Kernel has: "arm64: improve UVA layout for 32bit processes" >> ( main's 967022aa5aa6 ). So an armv7 process can be somewhat >> over 3 GiBytes for its address space. >> >> Boot aarch64 env: a PkgBase world and kernel.GENERIC-NODEBUG pair. > > The kernel choice was only used on the RPi4B, the world applies to > the RPi5 as well. > >> FYI: >> >> # uname -apKU >> FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT main-n268514-61b88a230bac GENERIC-NODEBUG arm64 aarch64 1500014 1500014 >> >> >> 4th: Configuration points unique to the OrangePi+2ed context. >> >> /usr/local/etc/poudriere.conf has . . . >> >> USE_TMPFS=no >> >> (Based on the smaller RAM --and smaller RAM+SWAP for avoiding >> potential-mistuning notices.) >> >> /usr/local/etc/poudriere.d/make.conf has . . . >> >> MAKE_JOBS_NUMBER_LIMIT=2 >> >> (Based on the smaller RAM --and smaller RAM+SWAP for avoiding >> potential-mistuning notices-- but wanting to still have margin >> for bigger peak RAM+SWAP use than the example happens to do.) >> >> RAM == 2 GiBytes >> RAM+SWAP == 5.6 GiBytes >> (Note armv7 has a smaller RAM multiplier limit without >> warning of potential swap-related mistuning: "total >> configured swap (? pages) exceeds maximum recommended >> amount (? pages)" with "increase kern.maxswzone or reduce >> amount of swap".) >> >> In /etc/rc.conf I have: >> >> if [ "`sysctl -i -n hw.fdt.model`" == "Xunlong Orange Pi Plus 2E" ]; then >> sysctl dev.cpu.0.freq=1008 > /dev/null >> fi >> >> In other words: a fixed 1GHz or so clock rate is used. >> >> USB2 media: Actually USB3 media that also supports USB2 >> use. 1 TB Samsung Touch T7 (NVMe based) via a powered hub, >> a USB3-capable one. >> >> >> >> Side note: >> >> I've no clue how to judge any tradeoff consequences for >> "increase kern.maxswzone" for judging reasonableness of >> such an action. > > I'm adding 8 GiByte RPi5 tests of involving use of the combination: USE_TMPFS=all TMPFS_BLACKLIST="rust" TMPFS_BLACKLIST_TMPDIR=${BASEFS}/data/cache/tmp Turns out that rust then uses 2.63 GiBytes of tmpfs space, instead of using 28 GiBytes or so in my context without the TMPFS_BLACKLIST involvement. (The last increment is during packaging but the vast majority is from building.) RPi5 configuration reminder: RAM == 8 GiBytes && RAM+SWAP == 38 BiBytes This is based on knowing that for what I build, rust is the only package build that uses more than about 8 GiBytes of RAM+SWAP for just tmpfs space. This was learned via monitoring poudriere bulk build status on systems that could handle the USE_TMPFS=all without the TMPFS_BLACKLIST use. It is also based on wanting to leave margin sufficient that the observed peak RAM+SWAP use overall (not just tmpfs) could more than double without running out. It is also prompted by the RPi5 having more normal amounts of RAM caching that is far more effective than what the RPi4B has. The PARALLEL_JOBS=2 use limits the number of large builds that can potentially be going on in parallel. Combination PARALLEL_JOBS=2 MAKE_JOBS_NUMBER_LIMIT=3 : [11:57:41] Stopping 2 builders [main-CA7-default] [2024-03-21_20h19m59s] [committing] Queued: 265 Built: 265 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 11:57:47 The observed maximum Act+Wir+Lndry+SwapUsed was reported as: 14360Mi At such a time, Act+Wir+Lndry+SwapUsed+InAct was reported as: 14377Mi Note: In a 4 hardware thread context, when both builders are busy, if one is single threaded for a time, the other will usually lead to all 4 hardware threads being kept busy making useful progress but when the single thread activity happens to be package-static, say, it is not slowed down by a load average significantly over 4 (not much backlog). Combination PARALLEL_JOBS=2 MAKE_JOBS_NUMBER_LIMIT=4 : [11:58:25] Stopping 2 builders [main-CA7-default] [2024-03-20_16h52m32s] [committing] Queued: 265 Built: 265 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 11:58:27 The observed maximum Act+Wir+Lndry+SwapUsed was reported as: 14766Mi At such a time, Act+Wir+Lndry+SwapUsed+InAct was reported as: 14767Mi Reminder that the prior 8 GiByte RPi5 combination test: USE_TMPFS=data PARALLEL_JOBS=2 MAKE_JOBS_NUMBER_LIMIT=3 got: [12:30:37] Stopping 2 builders [main-CA7-default] [2024-03-15_02h05m19s] [committing] Queued: 265 Built: 265 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 Time: 12:30:37 So the USE_TMPFS=all (other than rust tmpfs use) multiplied the time by about 0.96 . === Mark Millard marklmi at yahoo.com