Re: RPi3B -j4 vs. -j3 main [so: 14] from-scratch buildworld times for my context; buildkernel too; swap space usage and such

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 25 May 2023 04:04:43 UTC
On May 24, 2023, at 18:51, bob prohaska <fbsd@www.zefox.net> wrote:

> On Wed, May 24, 2023 at 05:46:11PM -0700, Mark Millard wrote:
>> RPi3B -j4 vs. -j3 buildworld times for my context:
>> 
>> World built in 120764 seconds, ncpu: 4, make -j4 [So a little under 33 hr 35 min]
>> World built in 115635 seconds, ncpu: 4, make -j3 [So a little under 32 hr 10 min]
>> [A delta of a little under 1hr 30min]
>> 
>> So: -j4 buildworld spent more time waiting for its trashing of
>> the swap space than time it gained from having use of a 4th
>> core. The trashing is mostly during building of libllvm, libclang,
>> and liblldb. The RPi3B RAM subsystem can limit the gain from
>> having more cores active as well.
>> 
>> 
>> By contrast . . .
>> 
>> RPi3B -j4 vs. -j3 buildkernel times for my context:
>> 
>> Kernel(s)  GENERIC-NODBG-CA53 built in 7836 seconds, ncpu: 4, make -j4 [So a little under 2 hr 15 min]
>> Kernel(s)  GENERIC-NODBG-CA53 built in 8723 seconds, ncpu: 4, make -j3 [So a little under 2 hr 30 min]
>> [A delta of a little under 15 min]
>> 
>> So: -j4 buildkernel spent less time waiting for its trashing of
>> the swap space than time it gained from having use of a 4th
>> core. (Not much thrashing occurred.)
>> 
>> 
>> And mem/swap usage info for buildworld+buildkernel . . .
>> 
>> Overall -j4 vs -j3 buildworld buildkernel info for my context:
>> 
>> -j4 Mem: . . ., 677688Ki MaxObsActive, 249652Ki MaxObsWired, 950032Ki MaxObs(Act+Wir+Lndry)
>> -j3 Mem: . . ., 683416Ki MaxObsActive, 315140Ki MaxObsWired, 927424Ki MaxObs(Act+Wir+Lndry)
>> 
>> -j4 Swap: . . ., 1495Mi MaxObsUsed, 2117Mi MaxObs(Act+Lndry+SwapUsed), 2358Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>> -j3 Swap: . . ., 1178Mi MaxObsUsed, 1811Mi MaxObs(Act+Lndry+SwapUsed), 2049Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>> 
>> 
>> 
>> FYI for the context:
>> make[1]: "/usr/main-src/Makefile.inc1" line 326: SYSTEM_COMPILER: Determined that CC=cc matches the source tree.  Not bootstrapping a cross-compiler.
>> make[1]: "/usr/main-src/Makefile.inc1" line 331: SYSTEM_LINKER: Determined that LD=ld matches the source tree.  Not bootstrapping a cross-linker.
>> 
>> 
>> Notes:
>> 
>> Incremental buildworld's would depend on how much rebuilding of
>> libllvm, libclang, and liblldb would happen to occur.
>> 
>> A system with 2 GiBytes of RAM would have far less trashing of
>> the swap space. A system with 4 GiBytes of RAM would not thrash
>> the swap space. The closest comparison I could make with 4
>> GiBytes of RAM would be the Rock64 doing a from-scratch build.
>> It is also cortex-a53 based. As I remember, its RAM subsystem
>> does not limit multiple cores as easily/much. I've no access to
>> an analogous 2 GiByte context.
>> 
> 
> Would a DRAM-backed USB "drive" used only for swap help any? I don't
> think it's practical, but I'm curious in principle. Long ago I think
> folks actually made hardware consisting of dynamic RAM coupled to a
> disk interface, probably SCSI, to get around physical RAM limits
> on older computers. It's kinda silly for a Pi, and expensive.

(Notes presumes well implemented. Notes ignore uses
like crash dumps that may need to have data survive
power loss.)

Compared to spinning rust? Sure, especially for latency
contributions to taking time. Compared to the NVMe USB3
drives I'm using? I expect so. But the RPi3B limits the
data rate when it is actually transferring data. DRAM
would not change that.

Compared to a 960 GB U2 Optane used via a USB3 adapter?
I'm not sure at what point the USB2 interface ends up
being the primary bottleneck, such that detecting
differences in various low latency media becomes
difficult.

But, what capacity? Cost vs an aarch64 with the capacity
in its RAM for capacities like 4 GiByte or 8 GiByte or
. . . ? Having a 4 GiByte+ aarch64 system is probably
more effective.

Of course, there are ports that are bigger builds than
a FreeBSD buildworld buildkernel. So the problem moves
around. But the extra capacity helps all cases.


FYI: The -j4 Rock64 buildworld buildkernel is in
progress. The Rock64 is set to 1200 MHz for its
clock. (The 1296 MHz setting never provided a
reliable context for what I have access to.)

FYI: I've added a "(sample) mean so far of %idle"
to my set of top hacks. Things like thrashing the
swap space would drive up the mean compared to
not having to wait for the I/O. (Waiting for swap
space I/O counts as idle time.) Thus, if the
sample mean is much greater than 25% in a 4 core
context that is thrashing for -j4, it suggests
that trying -j3 might cut the overall time.
Similarly for some other %idle figures and -jN
numbers.

(I start the top after the build starts and use
figures from during the build, not a significant
time after.)

===
Mark Millard
marklmi at yahoo.com