Re: Expected native build times on an RPI4?

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 06 Apr 2023 02:04:40 UTC
On Apr 3, 2023, at 13:42, Joseph Koshy <jkoshy@freebsd.org> wrote:

> A 'make -j3 buildworld' of a freshly checked out -current tree
> took over 15+ hours on an RPI4 before eventually running out
> of space (/usr/obj had reached 7G by then).
> 
> The CPU(s) ran at a top speed of 1500Mhz during the build,
> per 'sysctl dev.cpu.0.freq'.
> 
> Even so, the build hadn't managed to cross the 'building
> libraries' step.
> 
> I'm wondering how best to provision for building -current:
> how long does 'buildworld' take on this device usually, and
> how much disk space does a build of -current usually need?

I looked and I'd not recorded any buildwork buildkernel
timings notes since back in very late 2021. So what I
had need not be appropriate for now. I've finally got
around to starting a from scratch build, on a 8 GiByte
RAM "C0T" RPi4B. (My normal build are done on a different
type of aarch64 system.) This is a from-scratch build,
but of note are:

make[1]: "/usr/main-src/Makefile.inc1" line 327: SYSTEM_COMPILER: Determined that CC=cc matches the source tree.  Not bootstrapping a cross-compiler.
make[1]: "/usr/main-src/Makefile.inc1" line 332: SYSTEM_LINKER: Determined that LD=ld matches the source tree.  Not bootstrapping a cross-linker.

Sometimes bootstrapping build activity is required and
that would mean more time (and space) than for what I'm
timing.

(I've no clue if the  build attempt that you mentioned
involved building a bootstrap compiler or bootstrap
linker or both.)

[Timings added after much of the other text had been
typed in already.]

World build completed on Wed Apr  5 17:52:47 PDT 2023
World built in 26009 seconds, ncpu: 4, make -j4

So, for world, 26009sec*(1min/60sec)*(1hr/60min) == 7.2247_2222... hr < 7.3 hr.

Kernel build for GENERIC-NODBG-CA72 completed on Wed Apr  5 18:27:29 PDT 2023
Kernel(s)  GENERIC-NODBG-CA72 built in 2082 seconds, ncpu: 4, make -j4

So, for kernel, 2082sec*(1min/60sec)*(1hr/60min) == 0.578_3333... hr < 0.6 hr.

So, for total, somewhat under 8 hr.

(An example of needing bootstrapping would happen for
jumping from main being 14.0 to being 15.0 . Another
could be jumping from system clang 15 to system clang
16 . The additional time would not be trivial.)


Notes . . .

The RPi4B has heatsinks and a case with a fan. The
config.txt has the following added, among other
things:

[pi4]
over_voltage=6
arm_freq=2000
sdram_freq_min=3200
force_turbo=1

(I do not use FreeBSD facilities to manage arm_freq .)

The result has no temperature problems during such
builds. I picked arm_freq=2000 based on it working
across 7 example RPi4B's (mix of 8 GiByte "B0T"
and "C0T" and older 4 GiByte "B0T" variants). 2100 did
not prove to always work, given the other 3 settings.
I avoided system-specific tailoring in normal
operation and so standardized what they all use.

The media is a USB3 NVMe drive, not spinning rust,
nor a microsd card. The drive is powered from
just the RPi4B. The media has a UFS file system.
I avoid tmpfs use that competes for RAM. (I've also
got access to ZFS media around but that is not what
I'm testing with in this example.)

The power supply used for the RPi4B has more margin
than is typical: 5.1V, 3.5A.

A serial console is set up.

For a 8 GiByte RAM system I normally have 30 GiBytes
or so of swap space active (not special to buildworld
buildkernel activities but I'll not get into details
of why-so-much here). However, for this timing I'm
running without swap since I've not tested that on a
8GiBYte RPi4B in a long time. (Most of the potential
swap usage is tied to how I build ports into
packages, not buildworld buildkernel .)

The FreeBSD context is (output line split for
better readability):

# uname -apKU
FreeBSD CA72_UFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90
main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1400082 1400082

The build is building that same version from scratch
(after a "rm -fr" of the build-tree area). I do not
use ccache or the like. So: an example of a possible
upper bound on the required build time for a specific
configuration that is built, but no bootstrap compiler
or linker build involved.

I do this because comparing timings of incremental builds
that need not be doing the same increments is problematical.
(However configuring for allowing incremental instead of
only full builds is important to spending less total time
building across builds.)

I'll list various settings that I use. There are non-
obvious contributions too. For example I use a EtherNet ssh
session instead of the serial console: The serial console
can lead to waiting for fast scrolling output to finish.
(Matters more consistently for installworld and
installkernel scrolling output.) I run headless, avoiding
some competition for RAM and such. I do not load the
RPi4B with additional activities not even nice'd ones.

Note that it is a non-debug system that is running and
it is building a matching non-debug world and kernel.

In /boot/loader.conf I have:

# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120
#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=-1
#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)

(I'd not expected the 8 GiByte build to need
to page out to swap space so I left in place
my normal setting for vm.pfault_oom_attempts .)

In /etc/sysctl.conf I have:

# Together this pair avoids swapping out the process kernel stacks.
# This avoids processes for interacting with the system from being
# hung-up by such.
vm.swap_enabled=0
vm.swap_idle_enabled=0

(But, absent any active swap space, such would not
happen. However, the lack of active swap space is
not my normal context and the above is what I have
in place for normal use as well.)

Part of the below indicates that I avoid building
MIPS, POWERPC, RISCV, and X86 targeting materials
because I do not intend to target anything but
aarch64 and armv7 from aarch64 systems. This is not
the default. Going in the other direction, I build
CLANG_EXTRAS that builds more than what is default.
This combination makes my build timings ball-park
figures relative to your context.

An oddity is that I avoid much of the stripping so
my builds are somewhat  bigger than normal for the
materials produced. (I like the somewhat better
backtraces from leaving symbols in place, even if
the build is optimized and avoids full debug
information.)

I use:

TO_TYPE=aarch64
#
KERNCONF=GENERIC-NODBG-CA72
TARGET=arm64
.if ${.MAKE.LEVEL} == 0
TARGET_ARCH=${TO_TYPE}
.export TARGET_ARCH
.endif
#
WITH_SYSTEM_COMPILER=
WITH_SYSTEM_LINKER=
#
WITH_ELFTOOLCHAIN_BOOTSTRAP=
#Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=
WITH_LLVM_TARGET_AARCH64=
WITH_LLVM_TARGET_ARM=
WITHOUT_LLVM_TARGET_MIPS=
WITHOUT_LLVM_TARGET_POWERPC=
WITHOUT_LLVM_TARGET_RISCV=
WITHOUT_LLVM_TARGET_X86=
WITH_CLANG=
WITH_CLANG_IS_CC=
WITH_CLANG_FULL=
WITH_CLANG_EXTRAS=
WITH_LLD=
WITH_LLD_IS_LD=
WITH_LLDB=
#
WITH_BOOT=
#
#
WITHOUT_WERROR=
#WERROR=
MALLOC_PRODUCTION=
WITH_MALLOC_PRODUCTION=
WITHOUT_ASSERT_DEBUG=
WITHOUT_LLVM_ASSERTIONS=
#
# Avoid stripping but do not control host -g status as well:
DEBUG_FLAGS+=
#
WITH_REPRODUCIBLE_BUILD=
WITH_DEBUG_FILES=
#
# Use of the .clang 's here avoids
# interfering with other C<?>FLAGS
# usage, such as ?= usage.
CFLAGS.clang+= -mcpu=cortex-a72
CXXFLAGS.clang+= -mcpu=cortex-a72
CPPFLAGS.clang+= -mcpu=cortex-a72
ACFLAGS.arm64cpuid.S+=  -mcpu=cortex-a72+crypto
ACFLAGS.aesv8-armx.S+=  -mcpu=cortex-a72+crypto
ACFLAGS.ghashv8-armx.S+=        -mcpu=cortex-a72+crypto

Those last 6 lines lead to the code generation being
tuned for Cortex-A72's. (The code still works on
Cortex-A53's.) I expect such lines are rarely used
but I happen to.

I'll note that avoiding WITHOUT_LLVM_TARGET_ALL is
tied to old observed behavior that I've not
revalidated.

In the past, I've had examples where RPi4B -j3 built
in less time than -j4 for such full-build timing tests.
On a RPi4B, I've never had -j5 or higher build in less
time. (Some of this is the RPi4B RAM/RAM-cache
subsystem properties: easier than normal to
saturate the RAM access and the caching is small.
Another contribution may be the USB3 NVMe media
latency being small. Spinning rust might have
different tradeoffs, for example.) I've also never
had -j2 or less take less time for full builds.

(Folks that do not use vm.pageout_oom_seq to avoid
kills from happening may use -j2 or such to better
avoid having parts of some build attempts killed
sometimes.)

Unfortunately, I forgot to set up monitoring of
MaxObsActive, MaxObsWired, and MaxObs(Act+Wir+Lndry).
("MaxObs" is short for "Maximum Observed".) So I
do not have such figures to report. (I use a
modified top to get such figures.)

The build-tree size:

# du -xsm /usr/obj/BUILDs/main-CA72-nodbg-clang/usr/
13122 /usr/obj/BUILDs/main-CA72-nodbg-clang/usr/

But such is based on details of what I build vs.
what I do not, as well as lack of stipping. So, in
very round numbers, 20 GiBytes would be able to hold
a build. You might want notable margin, in part because
as FreeBSD and the toolchain progress, things have
tended to get bigger over time. Plus the figure is a
final size. If the peak size is larger, I do not know.
A debug build would take more space than my non-debug
build. Also, the 13122 does not include the build
materials for a bootstrap compiler or a bootstrap
linker (or both). Thus my rounding to 20 GiBytes as a
possibility for illustration.

Again: no ccache-like use. Otherwise there would be
more space someplace to consider overall.

===
Mark Millard
marklmi at yahoo.com