Re: RPi3B -j4 vs. -j3 main [so: 14] from-scratch buildworld times for my context; buildkernel too; swap space usage and such
Date: Thu, 25 May 2023 04:04:43 UTC
On May 24, 2023, at 18:51, bob prohaska <fbsd@www.zefox.net> wrote: > On Wed, May 24, 2023 at 05:46:11PM -0700, Mark Millard wrote: >> RPi3B -j4 vs. -j3 buildworld times for my context: >> >> World built in 120764 seconds, ncpu: 4, make -j4 [So a little under 33 hr 35 min] >> World built in 115635 seconds, ncpu: 4, make -j3 [So a little under 32 hr 10 min] >> [A delta of a little under 1hr 30min] >> >> So: -j4 buildworld spent more time waiting for its trashing of >> the swap space than time it gained from having use of a 4th >> core. The trashing is mostly during building of libllvm, libclang, >> and liblldb. The RPi3B RAM subsystem can limit the gain from >> having more cores active as well. >> >> >> By contrast . . . >> >> RPi3B -j4 vs. -j3 buildkernel times for my context: >> >> Kernel(s) GENERIC-NODBG-CA53 built in 7836 seconds, ncpu: 4, make -j4 [So a little under 2 hr 15 min] >> Kernel(s) GENERIC-NODBG-CA53 built in 8723 seconds, ncpu: 4, make -j3 [So a little under 2 hr 30 min] >> [A delta of a little under 15 min] >> >> So: -j4 buildkernel spent less time waiting for its trashing of >> the swap space than time it gained from having use of a 4th >> core. (Not much thrashing occurred.) >> >> >> And mem/swap usage info for buildworld+buildkernel . . . >> >> Overall -j4 vs -j3 buildworld buildkernel info for my context: >> >> -j4 Mem: . . ., 677688Ki MaxObsActive, 249652Ki MaxObsWired, 950032Ki MaxObs(Act+Wir+Lndry) >> -j3 Mem: . . ., 683416Ki MaxObsActive, 315140Ki MaxObsWired, 927424Ki MaxObs(Act+Wir+Lndry) >> >> -j4 Swap: . . ., 1495Mi MaxObsUsed, 2117Mi MaxObs(Act+Lndry+SwapUsed), 2358Mi MaxObs(Act+Wir+Lndry+SwapUsed) >> -j3 Swap: . . ., 1178Mi MaxObsUsed, 1811Mi MaxObs(Act+Lndry+SwapUsed), 2049Mi MaxObs(Act+Wir+Lndry+SwapUsed) >> >> >> >> FYI for the context: >> make[1]: "/usr/main-src/Makefile.inc1" line 326: SYSTEM_COMPILER: Determined that CC=cc matches the source tree. Not bootstrapping a cross-compiler. >> make[1]: "/usr/main-src/Makefile.inc1" line 331: SYSTEM_LINKER: Determined that LD=ld matches the source tree. Not bootstrapping a cross-linker. >> >> >> Notes: >> >> Incremental buildworld's would depend on how much rebuilding of >> libllvm, libclang, and liblldb would happen to occur. >> >> A system with 2 GiBytes of RAM would have far less trashing of >> the swap space. A system with 4 GiBytes of RAM would not thrash >> the swap space. The closest comparison I could make with 4 >> GiBytes of RAM would be the Rock64 doing a from-scratch build. >> It is also cortex-a53 based. As I remember, its RAM subsystem >> does not limit multiple cores as easily/much. I've no access to >> an analogous 2 GiByte context. >> > > Would a DRAM-backed USB "drive" used only for swap help any? I don't > think it's practical, but I'm curious in principle. Long ago I think > folks actually made hardware consisting of dynamic RAM coupled to a > disk interface, probably SCSI, to get around physical RAM limits > on older computers. It's kinda silly for a Pi, and expensive. (Notes presumes well implemented. Notes ignore uses like crash dumps that may need to have data survive power loss.) Compared to spinning rust? Sure, especially for latency contributions to taking time. Compared to the NVMe USB3 drives I'm using? I expect so. But the RPi3B limits the data rate when it is actually transferring data. DRAM would not change that. Compared to a 960 GB U2 Optane used via a USB3 adapter? I'm not sure at what point the USB2 interface ends up being the primary bottleneck, such that detecting differences in various low latency media becomes difficult. But, what capacity? Cost vs an aarch64 with the capacity in its RAM for capacities like 4 GiByte or 8 GiByte or . . . ? Having a 4 GiByte+ aarch64 system is probably more effective. Of course, there are ports that are bigger builds than a FreeBSD buildworld buildkernel. So the problem moves around. But the extra capacity helps all cases. FYI: The -j4 Rock64 buildworld buildkernel is in progress. The Rock64 is set to 1200 MHz for its clock. (The 1296 MHz setting never provided a reliable context for what I have access to.) FYI: I've added a "(sample) mean so far of %idle" to my set of top hacks. Things like thrashing the swap space would drive up the mean compared to not having to wait for the I/O. (Waiting for swap space I/O counts as idle time.) Thus, if the sample mean is much greater than 25% in a 4 core context that is thrashing for -j4, it suggests that trying -j3 might cut the overall time. Similarly for some other %idle figures and -jN numbers. (I start the top after the build starts and use figures from during the build, not a significant time after.) === Mark Millard marklmi at yahoo.com