Re: Expected native build times on an RPI4?
- In reply to: Mark Millard : "Re: Expected native build times on an RPI4?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 06 Apr 2023 20:53:04 UTC
On Apr 5, 2023, at 19:04, Mark Millard <marklmi@yahoo.com> wrote: > On Apr 3, 2023, at 13:42, Joseph Koshy <jkoshy@freebsd.org> wrote: > >> A 'make -j3 buildworld' of a freshly checked out -current tree >> took over 15+ hours on an RPI4 before eventually running out >> of space (/usr/obj had reached 7G by then). >> >> The CPU(s) ran at a top speed of 1500Mhz during the build, >> per 'sysctl dev.cpu.0.freq'. >> >> Even so, the build hadn't managed to cross the 'building >> libraries' step. >> >> I'm wondering how best to provision for building -current: >> how long does 'buildworld' take on this device usually, and >> how much disk space does a build of -current usually need? > > I looked and I'd not recorded any buildwork buildkernel > timings notes since back in very late 2021. So what I > had need not be appropriate for now. I've finally got > around to starting a from scratch build, on a 8 GiByte > RAM "C0T" RPi4B. (My normal build are done on a different > type of aarch64 system.) This is a from-scratch build, > but of note are: > > make[1]: "/usr/main-src/Makefile.inc1" line 327: SYSTEM_COMPILER: Determined that CC=cc matches the source tree. Not bootstrapping a cross-compiler. > make[1]: "/usr/main-src/Makefile.inc1" line 332: SYSTEM_LINKER: Determined that LD=ld matches the source tree. Not bootstrapping a cross-linker. > > Sometimes bootstrapping build activity is required and > that would mean more time (and space) than for what I'm > timing. > > (I've no clue if the build attempt that you mentioned > involved building a bootstrap compiler or bootstrap > linker or both.) > > [Timings added after much of the other text had been > typed in already.] > > World build completed on Wed Apr 5 17:52:47 PDT 2023 > World built in 26009 seconds, ncpu: 4, make -j4 > > So, for world, 26009sec*(1min/60sec)*(1hr/60min) == 7.2247_2222... hr < 7.3 hr. > > Kernel build for GENERIC-NODBG-CA72 completed on Wed Apr 5 18:27:29 PDT 2023 > Kernel(s) GENERIC-NODBG-CA72 built in 2082 seconds, ncpu: 4, make -j4 > > So, for kernel, 2082sec*(1min/60sec)*(1hr/60min) == 0.578_3333... hr < 0.6 hr. > > So, for total, somewhat under 8 hr. > > (An example of needing bootstrapping would happen for > jumping from main being 14.0 to being 15.0 . Another > could be jumping from system clang 15 to system clang > 16 . The additional time would not be trivial.) > > > Notes . . . > > The RPi4B has heatsinks and a case with a fan. The > config.txt has the following added, among other > things: > > [pi4] > over_voltage=6 > arm_freq=2000 > sdram_freq_min=3200 > force_turbo=1 > > (I do not use FreeBSD facilities to manage arm_freq .) > > The result has no temperature problems during such > builds. I picked arm_freq=2000 based on it working > across 7 example RPi4B's (mix of 8 GiByte "B0T" > and "C0T" and older 4 GiByte "B0T" variants). 2100 did > not prove to always work, given the other 3 settings. > I avoided system-specific tailoring in normal > operation and so standardized what they all use. > > The media is a USB3 NVMe drive, not spinning rust, > nor a microsd card. The drive is powered from > just the RPi4B. The media has a UFS file system. > I avoid tmpfs use that competes for RAM. (I've also > got access to ZFS media around but that is not what > I'm testing with in this example.) > > The power supply used for the RPi4B has more margin > than is typical: 5.1V, 3.5A. > > A serial console is set up. > > For a 8 GiByte RAM system I normally have 30 GiBytes > or so of swap space active (not special to buildworld > buildkernel activities but I'll not get into details > of why-so-much here). However, for this timing I'm > running without swap since I've not tested that on a > 8GiBYte RPi4B in a long time. (Most of the potential > swap usage is tied to how I build ports into > packages, not buildworld buildkernel .) > > The FreeBSD context is (output line split for > better readability): > > # uname -apKU > FreeBSD CA72_UFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 > main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 > root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 > arm64 aarch64 1400082 1400082 > > The build is building that same version from scratch > (after a "rm -fr" of the build-tree area). I do not > use ccache or the like. So: an example of a possible > upper bound on the required build time for a specific > configuration that is built, but no bootstrap compiler > or linker build involved. > > I do this because comparing timings of incremental builds > that need not be doing the same increments is problematical. > (However configuring for allowing incremental instead of > only full builds is important to spending less total time > building across builds.) > > I'll list various settings that I use. There are non- > obvious contributions too. For example I use a EtherNet ssh > session instead of the serial console: The serial console > can lead to waiting for fast scrolling output to finish. > (Matters more consistently for installworld and > installkernel scrolling output.) I run headless, avoiding > some competition for RAM and such. I do not load the > RPi4B with additional activities not even nice'd ones. > > Note that it is a non-debug system that is running and > it is building a matching non-debug world and kernel. > > In /boot/loader.conf I have: > > # Delay when persistent low free RAM leads to > # Out Of Memory killing of processes: > vm.pageout_oom_seq=120 > # > # For plunty of swap/paging space (will not > # run out), avoid pageout delays leading to > # Out Of Memory killing of processes: > vm.pfault_oom_attempts=-1 > # > # For possibly insufficient swap/paging space > # (might run out), increase the pageout delay > # that leads to Out Of Memory killing of > # processes (showing defaults at the time): > #vm.pfault_oom_attempts= 3 > #vm.pfault_oom_wait= 10 > # (The multiplication is the total but there > # are other potential tradoffs in the factors > # multiplied, even for nearly the same total.) > > (I'd not expected the 8 GiByte build to need > to page out to swap space so I left in place > my normal setting for vm.pfault_oom_attempts .) > > In /etc/sysctl.conf I have: > > # Together this pair avoids swapping out the process kernel stacks. > # This avoids processes for interacting with the system from being > # hung-up by such. > vm.swap_enabled=0 > vm.swap_idle_enabled=0 > > (But, absent any active swap space, such would not > happen. However, the lack of active swap space is > not my normal context and the above is what I have > in place for normal use as well.) > > Part of the below indicates that I avoid building > MIPS, POWERPC, RISCV, and X86 targeting materials > because I do not intend to target anything but > aarch64 and armv7 from aarch64 systems. This is not > the default. Going in the other direction, I build > CLANG_EXTRAS that builds more than what is default. > This combination makes my build timings ball-park > figures relative to your context. > > An oddity is that I avoid much of the stripping so > my builds are somewhat bigger than normal for the > materials produced. (I like the somewhat better > backtraces from leaving symbols in place, even if > the build is optimized and avoids full debug > information.) > > I use: > > TO_TYPE=aarch64 > # > KERNCONF=GENERIC-NODBG-CA72 > TARGET=arm64 > .if ${.MAKE.LEVEL} == 0 > TARGET_ARCH=${TO_TYPE} > .export TARGET_ARCH > .endif > # > WITH_SYSTEM_COMPILER= > WITH_SYSTEM_LINKER= > # > WITH_ELFTOOLCHAIN_BOOTSTRAP= > #Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL= > WITH_LLVM_TARGET_AARCH64= > WITH_LLVM_TARGET_ARM= > WITHOUT_LLVM_TARGET_MIPS= > WITHOUT_LLVM_TARGET_POWERPC= > WITHOUT_LLVM_TARGET_RISCV= > WITHOUT_LLVM_TARGET_X86= > WITH_CLANG= > WITH_CLANG_IS_CC= > WITH_CLANG_FULL= > WITH_CLANG_EXTRAS= > WITH_LLD= > WITH_LLD_IS_LD= > WITH_LLDB= > # > WITH_BOOT= > # > # > WITHOUT_WERROR= > #WERROR= > MALLOC_PRODUCTION= > WITH_MALLOC_PRODUCTION= > WITHOUT_ASSERT_DEBUG= > WITHOUT_LLVM_ASSERTIONS= > # > # Avoid stripping but do not control host -g status as well: > DEBUG_FLAGS+= > # > WITH_REPRODUCIBLE_BUILD= > WITH_DEBUG_FILES= > # > # Use of the .clang 's here avoids > # interfering with other C<?>FLAGS > # usage, such as ?= usage. > CFLAGS.clang+= -mcpu=cortex-a72 > CXXFLAGS.clang+= -mcpu=cortex-a72 > CPPFLAGS.clang+= -mcpu=cortex-a72 > ACFLAGS.arm64cpuid.S+= -mcpu=cortex-a72+crypto > ACFLAGS.aesv8-armx.S+= -mcpu=cortex-a72+crypto > ACFLAGS.ghashv8-armx.S+= -mcpu=cortex-a72+crypto > > Those last 6 lines lead to the code generation being > tuned for Cortex-A72's. (The code still works on > Cortex-A53's.) I expect such lines are rarely used > but I happen to. > > I'll note that avoiding WITHOUT_LLVM_TARGET_ALL is > tied to old observed behavior that I've not > revalidated. > > In the past, I've had examples where RPi4B -j3 built > in less time than -j4 for such full-build timing tests. > On a RPi4B, I've never had -j5 or higher build in less > time. (Some of this is the RPi4B RAM/RAM-cache > subsystem properties: easier than normal to > saturate the RAM access and the caching is small. > Another contribution may be the USB3 NVMe media > latency being small. Spinning rust might have > different tradeoffs, for example.) I've also never > had -j2 or less take less time for full builds. > > (Folks that do not use vm.pageout_oom_seq to avoid > kills from happening may use -j2 or such to better > avoid having parts of some build attempts killed > sometimes.) > > Unfortunately, I forgot to set up monitoring of > MaxObsActive, MaxObsWired, and MaxObs(Act+Wir+Lndry). > ("MaxObs" is short for "Maximum Observed".) So I > do not have such figures to report. (I use a > modified top to get such figures.) > > The build-tree size: > > # du -xsm /usr/obj/BUILDs/main-CA72-nodbg-clang/usr/ > 13122 /usr/obj/BUILDs/main-CA72-nodbg-clang/usr/ > > But such is based on details of what I build vs. > what I do not, as well as lack of stipping. So, in > very round numbers, 20 GiBytes would be able to hold > a build. You might want notable margin, in part because > as FreeBSD and the toolchain progress, things have > tended to get bigger over time. Plus the figure is a > final size. If the peak size is larger, I do not know. > A debug build would take more space than my non-debug > build. Also, the 13122 does not include the build > materials for a bootstrap compiler or a bootstrap > linker (or both). Thus my rounding to 20 GiBytes as a > possibility for illustration. > > Again: no ccache-like use. Otherwise there would be > more space someplace to consider overall. > I repeated the "rm -fr", rebooted, and did a -j3 buildworld buildkernel . The result was: World build completed on Thu Apr 6 03:31:43 PDT 2023 World built in 28858 seconds, ncpu: 4, make -j3 So, for world, 28858sec*(1min/60sec)*(1hr/60min) == 8.016_1111... hr < 8.1 hr. Kernel build for GENERIC-NODBG-CA72 completed on Thu Apr 6 04:10:26 PDT 2023 Kernel(s) GENERIC-NODBG-CA72 built in 2323 seconds, ncpu: 4, make -j3 So, for kernel, 2323sec*(1min/60sec)*(1hr/60min) == 0.6452_7777... hr < 0.7 hr. So, for total, somewhat under 8.8 hr. So 31181sec/28091sec ~= 1.11 times what than -j4 took. I did remember to get MaxObs figures for this: load averages: . . . MaxObs: 3.59, 3.21, 3.09 1404Mi MaxObsActive, 1155Mi MaxObsWired, 2383Mi MaxObs(Act+Wir+Lndry) (Note: Laundry did end up non-zero, despite the lack of swap space.) So this combination looks like it would not need swap space for a 4 GiByte RPi4B but likely would need such for a 2 GiByte RPI4B. Looks like the same could be true of a -j4 build. After that I repeated the "rm -fr", rebooted, and did a -j5 buildworld buildkernel . The result was: World build completed on Thu Apr 6 12:42:04 PDT 2023 World built in 25940 seconds, ncpu: 4, make -j5 So, for world, 25940sec*(1min/60sec)*(1hr/60min) == 7.20_5555... hr < 7.3 hr. Kernel build for GENERIC-NODBG-CA72 completed on Thu Apr 6 13:16:50 PDT 2023 Kernel(s) GENERIC-NODBG-CA72 built in 2086 seconds, ncpu: 4, make -j5 So, for kernel, 2086sec*(1min/60sec)*(1hr/60min) == 0.579_4444... hr < 0.6 hr. So, for total, somewhat under 8 hr. So around 28026sec/28091sec ~= 0.998 times what -j4 took. Note a small scale example of a tradeoff that can occur based on the details of what is being built: buildworld took less time but buildkernel took more. I did remember to get MaxObs figures for this: load averages: . . . MaxObs: 5.57, 5.29, 5.17 1790Mi MaxObsActive, 1157Mi MaxObsWired, 2775Mi MaxObs(Act+Wir+Lndry) (Note: Laundry did end up non-zero, despite the lack of swap space.) So this combination looks like it would not need swap space for a 4 GiByte RPi4B but would need such for a 2 GiByte RPi4B. Incremental builds (META_MODE) variability, ccache avoidance of compiles variability, and media access timing properties could all lead to other tradeoffs in specific builds for what -jN's work better. ZFS would be a significant change of context because of Wired memory handling. (The ARC leads to a far more widely variable Wired-memory usage pattern, for example.) I'm not claiming the above indicates some universal answer to what is optimal across a range of contexts. One thing that was different for a time for my older timings was that some Google test build used to take large amounts of RAM and time compared to the figures I report above. If I remember right, this stopped when FreeBSD adjusted the specific test's build to generate unoptimized code, avoiding the bad-case in the LLVM toolchain's optimization handling for generating the test involved. Note: The ZFS ARC's Wired memory usage makes any "MaxObs" that includes a Wired memory contribution not readily comparable to the same "MaxObs" for a UFS context. === Mark Millard marklmi at yahoo.com