Re: Expected native build times on an RPI4?
Date: Thu, 06 Apr 2023 02:04:40 UTC
On Apr 3, 2023, at 13:42, Joseph Koshy <jkoshy@freebsd.org> wrote: > A 'make -j3 buildworld' of a freshly checked out -current tree > took over 15+ hours on an RPI4 before eventually running out > of space (/usr/obj had reached 7G by then). > > The CPU(s) ran at a top speed of 1500Mhz during the build, > per 'sysctl dev.cpu.0.freq'. > > Even so, the build hadn't managed to cross the 'building > libraries' step. > > I'm wondering how best to provision for building -current: > how long does 'buildworld' take on this device usually, and > how much disk space does a build of -current usually need? I looked and I'd not recorded any buildwork buildkernel timings notes since back in very late 2021. So what I had need not be appropriate for now. I've finally got around to starting a from scratch build, on a 8 GiByte RAM "C0T" RPi4B. (My normal build are done on a different type of aarch64 system.) This is a from-scratch build, but of note are: make[1]: "/usr/main-src/Makefile.inc1" line 327: SYSTEM_COMPILER: Determined that CC=cc matches the source tree. Not bootstrapping a cross-compiler. make[1]: "/usr/main-src/Makefile.inc1" line 332: SYSTEM_LINKER: Determined that LD=ld matches the source tree. Not bootstrapping a cross-linker. Sometimes bootstrapping build activity is required and that would mean more time (and space) than for what I'm timing. (I've no clue if the build attempt that you mentioned involved building a bootstrap compiler or bootstrap linker or both.) [Timings added after much of the other text had been typed in already.] World build completed on Wed Apr 5 17:52:47 PDT 2023 World built in 26009 seconds, ncpu: 4, make -j4 So, for world, 26009sec*(1min/60sec)*(1hr/60min) == 7.2247_2222... hr < 7.3 hr. Kernel build for GENERIC-NODBG-CA72 completed on Wed Apr 5 18:27:29 PDT 2023 Kernel(s) GENERIC-NODBG-CA72 built in 2082 seconds, ncpu: 4, make -j4 So, for kernel, 2082sec*(1min/60sec)*(1hr/60min) == 0.578_3333... hr < 0.6 hr. So, for total, somewhat under 8 hr. (An example of needing bootstrapping would happen for jumping from main being 14.0 to being 15.0 . Another could be jumping from system clang 15 to system clang 16 . The additional time would not be trivial.) Notes . . . The RPi4B has heatsinks and a case with a fan. The config.txt has the following added, among other things: [pi4] over_voltage=6 arm_freq=2000 sdram_freq_min=3200 force_turbo=1 (I do not use FreeBSD facilities to manage arm_freq .) The result has no temperature problems during such builds. I picked arm_freq=2000 based on it working across 7 example RPi4B's (mix of 8 GiByte "B0T" and "C0T" and older 4 GiByte "B0T" variants). 2100 did not prove to always work, given the other 3 settings. I avoided system-specific tailoring in normal operation and so standardized what they all use. The media is a USB3 NVMe drive, not spinning rust, nor a microsd card. The drive is powered from just the RPi4B. The media has a UFS file system. I avoid tmpfs use that competes for RAM. (I've also got access to ZFS media around but that is not what I'm testing with in this example.) The power supply used for the RPi4B has more margin than is typical: 5.1V, 3.5A. A serial console is set up. For a 8 GiByte RAM system I normally have 30 GiBytes or so of swap space active (not special to buildworld buildkernel activities but I'll not get into details of why-so-much here). However, for this timing I'm running without swap since I've not tested that on a 8GiBYte RPi4B in a long time. (Most of the potential swap usage is tied to how I build ports into packages, not buildworld buildkernel .) The FreeBSD context is (output line split for better readability): # uname -apKU FreeBSD CA72_UFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082 The build is building that same version from scratch (after a "rm -fr" of the build-tree area). I do not use ccache or the like. So: an example of a possible upper bound on the required build time for a specific configuration that is built, but no bootstrap compiler or linker build involved. I do this because comparing timings of incremental builds that need not be doing the same increments is problematical. (However configuring for allowing incremental instead of only full builds is important to spending less total time building across builds.) I'll list various settings that I use. There are non- obvious contributions too. For example I use a EtherNet ssh session instead of the serial console: The serial console can lead to waiting for fast scrolling output to finish. (Matters more consistently for installworld and installkernel scrolling output.) I run headless, avoiding some competition for RAM and such. I do not load the RPi4B with additional activities not even nice'd ones. Note that it is a non-debug system that is running and it is building a matching non-debug world and kernel. In /boot/loader.conf I have: # Delay when persistent low free RAM leads to # Out Of Memory killing of processes: vm.pageout_oom_seq=120 # # For plunty of swap/paging space (will not # run out), avoid pageout delays leading to # Out Of Memory killing of processes: vm.pfault_oom_attempts=-1 # # For possibly insufficient swap/paging space # (might run out), increase the pageout delay # that leads to Out Of Memory killing of # processes (showing defaults at the time): #vm.pfault_oom_attempts= 3 #vm.pfault_oom_wait= 10 # (The multiplication is the total but there # are other potential tradoffs in the factors # multiplied, even for nearly the same total.) (I'd not expected the 8 GiByte build to need to page out to swap space so I left in place my normal setting for vm.pfault_oom_attempts .) In /etc/sysctl.conf I have: # Together this pair avoids swapping out the process kernel stacks. # This avoids processes for interacting with the system from being # hung-up by such. vm.swap_enabled=0 vm.swap_idle_enabled=0 (But, absent any active swap space, such would not happen. However, the lack of active swap space is not my normal context and the above is what I have in place for normal use as well.) Part of the below indicates that I avoid building MIPS, POWERPC, RISCV, and X86 targeting materials because I do not intend to target anything but aarch64 and armv7 from aarch64 systems. This is not the default. Going in the other direction, I build CLANG_EXTRAS that builds more than what is default. This combination makes my build timings ball-park figures relative to your context. An oddity is that I avoid much of the stripping so my builds are somewhat bigger than normal for the materials produced. (I like the somewhat better backtraces from leaving symbols in place, even if the build is optimized and avoids full debug information.) I use: TO_TYPE=aarch64 # KERNCONF=GENERIC-NODBG-CA72 TARGET=arm64 .if ${.MAKE.LEVEL} == 0 TARGET_ARCH=${TO_TYPE} .export TARGET_ARCH .endif # WITH_SYSTEM_COMPILER= WITH_SYSTEM_LINKER= # WITH_ELFTOOLCHAIN_BOOTSTRAP= #Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL= WITH_LLVM_TARGET_AARCH64= WITH_LLVM_TARGET_ARM= WITHOUT_LLVM_TARGET_MIPS= WITHOUT_LLVM_TARGET_POWERPC= WITHOUT_LLVM_TARGET_RISCV= WITHOUT_LLVM_TARGET_X86= WITH_CLANG= WITH_CLANG_IS_CC= WITH_CLANG_FULL= WITH_CLANG_EXTRAS= WITH_LLD= WITH_LLD_IS_LD= WITH_LLDB= # WITH_BOOT= # # WITHOUT_WERROR= #WERROR= MALLOC_PRODUCTION= WITH_MALLOC_PRODUCTION= WITHOUT_ASSERT_DEBUG= WITHOUT_LLVM_ASSERTIONS= # # Avoid stripping but do not control host -g status as well: DEBUG_FLAGS+= # WITH_REPRODUCIBLE_BUILD= WITH_DEBUG_FILES= # # Use of the .clang 's here avoids # interfering with other C<?>FLAGS # usage, such as ?= usage. CFLAGS.clang+= -mcpu=cortex-a72 CXXFLAGS.clang+= -mcpu=cortex-a72 CPPFLAGS.clang+= -mcpu=cortex-a72 ACFLAGS.arm64cpuid.S+= -mcpu=cortex-a72+crypto ACFLAGS.aesv8-armx.S+= -mcpu=cortex-a72+crypto ACFLAGS.ghashv8-armx.S+= -mcpu=cortex-a72+crypto Those last 6 lines lead to the code generation being tuned for Cortex-A72's. (The code still works on Cortex-A53's.) I expect such lines are rarely used but I happen to. I'll note that avoiding WITHOUT_LLVM_TARGET_ALL is tied to old observed behavior that I've not revalidated. In the past, I've had examples where RPi4B -j3 built in less time than -j4 for such full-build timing tests. On a RPi4B, I've never had -j5 or higher build in less time. (Some of this is the RPi4B RAM/RAM-cache subsystem properties: easier than normal to saturate the RAM access and the caching is small. Another contribution may be the USB3 NVMe media latency being small. Spinning rust might have different tradeoffs, for example.) I've also never had -j2 or less take less time for full builds. (Folks that do not use vm.pageout_oom_seq to avoid kills from happening may use -j2 or such to better avoid having parts of some build attempts killed sometimes.) Unfortunately, I forgot to set up monitoring of MaxObsActive, MaxObsWired, and MaxObs(Act+Wir+Lndry). ("MaxObs" is short for "Maximum Observed".) So I do not have such figures to report. (I use a modified top to get such figures.) The build-tree size: # du -xsm /usr/obj/BUILDs/main-CA72-nodbg-clang/usr/ 13122 /usr/obj/BUILDs/main-CA72-nodbg-clang/usr/ But such is based on details of what I build vs. what I do not, as well as lack of stipping. So, in very round numbers, 20 GiBytes would be able to hold a build. You might want notable margin, in part because as FreeBSD and the toolchain progress, things have tended to get bigger over time. Plus the figure is a final size. If the peak size is larger, I do not know. A debug build would take more space than my non-debug build. Also, the 13122 does not include the build materials for a bootstrap compiler or a bootstrap linker (or both). Thus my rounding to 20 GiBytes as a possibility for illustration. Again: no ccache-like use. Otherwise there would be more space someplace to consider overall. === Mark Millard marklmi at yahoo.com