Re: Call for Foundation-supported Project Ideas (buildworld buildkernel time issue)

From: Mark Millard via freebsd-hackers <freebsd-hackers_at_freebsd.org>
Date: Tue, 30 Nov 2021 03:04:41 UTC
From: Steve Kargl <sgk_at_troutmask.apl.washington.edu> wrote on
Date: Sun, 28 Nov 2021 14:07:32 -0800

> 1.) Replace clang with something/anything that is more performant.
> Going on day 3 of "make buildworld".  Still in the lib/clang/libclang
> directory.

Just an FYI for comparison:

An appropriately configured 8GiByte RPi4B builds such in
much less time than that: under 10 hours. Building the
system llvm materials is included in the measured
example below, but not a bootstrap compiler or linker. (This
is the type of build example I give below because it was
also handy for something I want to do.)

I'd not call a well-configured 8 GiByte RPi4B high-end these
days. But, nor is it low end as far as small board computers
go. (Hardware like the MACCHIATObin Double Shot [4 Cortext-A72
cores, 16  GiBytes of RAM installed] and the old OverDrive
1000 [4 Cortext-A57 cores, 8 GiBytes of RAM installed] 
are/were not SBCs and take/took noticeably less time based
mostly on a more performant RAM + RAM-caching implementation
from what I've seen. The slower clock rate and older Cortex
variant in the OverDrive 1000 historicially took the least
time of the 3, again mostly for RAM + RAM-caching tied
performance reasons from what I saw.)


The following is for a from-scratch debug build of main
[so: 14] being built by a non-debug system that was built
from the same source. Thus the WITH_META_MODE= that is in
use adds some overhead to the specific build. It is an
example where the system compiler and linker are built
only once: bootstrapping copies are not built. That would
add some time but is not needed often. (I've no clue if
your 2+ day build built a bootstrap compiler and/or
linker or not.)


--- buildworld ---
make[1]: "/usr/main-src/Makefile.inc1" line 340: SYSTEM_COMPILER: Determined that CC=cc matches the source tree.  Not bootstrapping a cross-compiler.
make[1]: "/usr/main-src/Makefile.inc1" line 345: SYSTEM_LINKER: Determined that LD=ld matches the source tree.  Not bootstrapping a cross-linker.

It is a -j4 build (there are 4 cores in the RPi4B).


buildworld time:

World build completed on Mon Nov 29 18:12:55 PST 2021
World built in 23919 seconds, ncpu: 4, make -j4

So: somewhat under 6.7 hours.


buildkernel time:

Kernel build for GENERIC-DBG-CA72 completed on Mon Nov 29 18:40:44 PST 2021
Kernel(s)  GENERIC-DBG-CA72 built in 1669 seconds, ncpu: 4, make -j4


So: somewhat under 0.5 hours.


Total time:

23919 sec + 1669 sec == 25588 sec

So: somewhat under 7.2 hours, but say under 10
hours to allow for some variation in what might
be built and the like.



For reference for the building environment:

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #22 main-n250972-319e9fc642a1-dirty: Tue Nov 23 12:25:36 PST 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400042 1400042

Even my "nodbg" builds include debug information,
despite being optimized. It is the kernel's debug
features which have been disabled. I give the
src.config configuration later. The various
WITHOUT_LLVM_TARGET_*'s do save some time but
not huge amounts of it relative to the times
reported here --but I also do WITH_CLANG_EXTRAS=
which adds some time.

I buildworld and buildkernel with -mcpu=cortex-a72
involved, a type of thing I only do for lower end
systems, not for something like a ThreadRipper
1950X.

The build never used the swap space. My patched
top (that tracks and reports various
maximum-observed figures) reported:

. . .
Mem: . . . 2380Mi MaxObsActive, 3866Mi MaxObsWired, 4941Mi MaxObs(Act+Wir+Lndry)
. . .
Swap: 14336Mi Total, 14336Mi Free, 2380Mi MaxObs(Act+Lndry+SwapUsed), 4941Mi MaxObs(Act+Wir+Lndry+SwapUsed)

(UFS tends to get very different Wired figures,
and, so, also difference for various other figures.)

The 8 GiByte RPi4B is using USB3 portable SSD media
(a: T7 Touch). The media that I used is set up with
root-on-ZFS (no UFS use) but historically root-on-UFS
(no ZFS use) has not been a large variation. I could
time via the UFS-based media if it is of interest
(also T7 Touch media).

The RPi4B has heat sinks and case with a fan. I use
a CanaKit 5.1V 3.5A power supply. I have:

over_voltage=6 
arm_freq=2000 
sdram_freq_min=3200 
force_turbo=1 

in the RPi4B's config.txt . These settings are ones that
were set to work well with every RPi4B that I've used,
with some margin. (All have heat sinks, a case with fan,
and a 5.1V 3.5A power supply, so I've not tested other
contexts.)

The src.conf sort of material looks like:

# more ~/src.configs/src.conf.CA72-dbg-clang.aarch64-host
TO_TYPE=aarch64
TOOLS_TO_TYPE=${TO_TYPE}
#
KERNCONF=GENERIC-DBG-CA72
TARGET=arm64
.if ${.MAKE.LEVEL} == 0
TARGET_ARCH=${TO_TYPE}
.export TARGET_ARCH
.endif
#
#WITH_CROSS_COMPILER=
WITH_SYSTEM_COMPILER=
WITH_SYSTEM_LINKER=
#
#WITH_LLD_BOOTSTRAP=
WITH_ELFTOOLCHAIN_BOOTSTRAP=
#Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=
WITH_LLVM_TARGET_AARCH64=
WITH_LLVM_TARGET_ARM=
WITHOUT_LLVM_TARGET_MIPS=
WITHOUT_LLVM_TARGET_POWERPC=
WITHOUT_LLVM_TARGET_RISCV=
WITHOUT_LLVM_TARGET_X86=
#WITH_CLANG_BOOTSTRAP=
WITH_CLANG=
WITH_CLANG_IS_CC=
WITH_CLANG_FULL=
WITH_CLANG_EXTRAS=
WITH_LLD=
WITH_LLD_IS_LD=
WITH_LLDB=
#
WITH_BOOT=
#
#
WITHOUT_WERROR=
#WERROR=
#MALLOC_PRODUCTION=
WITHOUT_MALLOC_PRODUCTION=
WITH_ASSERT_DEBUG=
WITH_LLVM_ASSERTIONS=
#
# Avoid stripping but do not control host -g status as well:
DEBUG_FLAGS+=
#
WITH_REPRODUCIBLE_BUILD=
WITH_DEBUG_FILES=
#
XCFLAGS+= -mcpu=cortex-a72
XCXXFLAGS+= -mcpu=cortex-a72
# There is no XCPPFLAGS but XCPP gets XCFLAGS content.
ACFLAGS.arm64cpuid.S+=  -mcpu=cortex-a72+crypto
ACFLAGS.aesv8-armx.S+=  -mcpu=cortex-a72+crypto
ACFLAGS.ghashv8-armx.S+=        -mcpu=cortex-a72+crypto

(Comments about why specific options were not used for
reasons of some odd consequence once observed may not
have been checked in some time. Options commented
out without such notes are just a simple choices, not
driven by such oddities.)


One thing that can slow down builds if there is
rapid build output at times: serial console handling
of that output. (Very noticeable for installworld
and installkernel to a directory.) I used an ssh
session to avoid the potential contribution to the
time.

The OverDrive 1000 died some time ago but I
still have access to the MACHHIATObin Double Shot
and I could run a timing test on it for building
the same sources that same way. (Same Cortex-A72
clock rate in use as used in the RPi4B test: 2.0
GHz.)

Hmm. The buildkernel got a bunch of:

ERROR: ctfconvert: failed to get mapping for tid ????? <????>

notices. I do not expect the issue changed the
time much but note them in case.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)