Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)

Reply: Mark Millard : "Re: Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 10 Nov 2023 01:26:57 UTC
Reading some benchmark results for compilation activity that showed some
SMT vs. not examples and also using my C++ variant of the old HINT
benchmark, I ended up curious how a non-SMT from scratch bulk -a would
end up (ZFS context) compared my prior SMT based run.

I use a high load average style of bulk -a activity that has USE_TMPFS=all
involved. The system has 96 GiBytes of RAM (total across the 2 DIMMs).
The original under 1.5 day time definitely had significant swap space use
(RAM+SWAP = 96 GiBYtes + 364 GiBytes == 460 GiBytes == 471040 MiBytes).
The media was (and is) a PCIe based Optane 905P 1.5T. ZFS on a single
partition on the single drive, ZFS used just for bectl reasons, not other
typical use-ZFS reasons. I've not controlled the ARC size-range explicitly.

So less swap partition use is part of contribution to the results.

The original bulk -a spent a couple of hours at the end where it was
just fetching and building textproc/stardict-quick . I have not cleared
out /usr/ports/distfiles or updated anything.

So fetch time is also a difference here.

SMT (32 hardware threads, original bulk -a):

[33:10:00] [32] [04:37:23] Finished emulators/libretro-mame | libretro-mame-20220124_1: Success
[35:36:51] [23] [03:44:04] Finished textproc/stardict-quick | stardict-quick-2.4.2_9: Success
. . .
[main-amd64-bulk_a-default] [2023-11-01_07h14m50s] [committing:] Queued: 34683 Built: 33826 Failed: 179   Skipped: 358   Ignored: 320   Fetched: 0     Tobuild: 0      Time: 35:37:55

Swap-involved MaxObs (Max Observed) figures:
173310Mi MaxObsUsed
256332Mi MaxObs(Act+Lndry+SwapUsed)
265551Mi MaxObs(Act+Wir+Lndry+SwapUsed)
(So 265551Mi of 471040Mi RAM+SWAP.)

Just-RAM MaxObs figures:
81066Mi MaxObsActive
(Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.)
94493Mi MaxObs(Act+Wir+Lndry)

Note: MaxObs(A+B+C) <= MaxObs(A)+MaxObs(B)+MaxObs(C)

ALLOW_MAKE_JOBS=yes was used. No explicit restriction on PARALLEL_JOBS
or MAKE_JOBS_NUMBER (or analogous). So 32 builders allowed, each allowed
32 make jobs. This explains the high load averages of the bulk -a :

load averages . . . MaxObs: 360.70, 267.63, 210.84
(Those need not be all from the same time frame during the bulk -a .)

As for the ports vintage:

# ~/fbsd-based-on-what-commit.sh -C /usr/ports/
6ec8e3450b29 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/sdts++: Mark DEPRECATED
Author:     Muhammad Moinur Rahman <bofh@FreeBSD.org>
Commit:     Muhammad Moinur Rahman <bofh@FreeBSD.org>
CommitDate: 2023-10-21 19:01:38 +0000
branch: main
merge-base: 6ec8e3450b29462a590d09fb0b07ed214d456bd5
merge-base: CommitDate: 2023-10-21 19:01:38 +0000
n637598 (--first-parent --count for merge-base)

I do have a environment that avoids various LLVM builds taking
as long to build :

llvm1[3-7]  : no MLIR, no FLANG
llvm1[4-7]  : use BE_NATIVE
other llvm* : use defaults (so, no avoidance)

I also prevent the builds from using strip on most of the install
materials built (not just toolchain materials).


non-SMT (16 hardware threads):

Note one builder (math/fricas), the last still present, was
stuck and I had to kill processes to have it stop unless I
was willing to wiat for my large timeout figures. The last
builder normal-finish was:

[39:48:10] [09] [00:16:23] Finished devel/gcc-msp430-ti-toolchain | gcc-msp430-ti-toolchain-9.3.1.2.20210722_1: Success

So, trying to place some bounds for comparing to SMT (32 hw threads)
and non-SMT (16 hw threads):

33:10:00 SMT -> 39:48:10 non-SMT would be over 6.5 hrs longer for non-SMT
35:36:51 SMT -> 39:48:10 non-SMT would be over 4 hrs longer for non-SMT

As for SMT vs. non-SMT Maximum Observed figures:

SMT     load averages . . . MaxObs: 360.70, 267.63, 210.84
non-SMT load averages . . . MaxObs: 152.89, 100.94,  76.28

Swap-involved MaxObs figures for SMT (32 hw threads) vs not (16):
173310Mi vs.  33003Mi MaxObsUsed
256332Mi vs. 117221Mi MaxObs(Act+Lndry+SwapUsed)
265551Mi vs. 124776Mi MaxObs(Act+Wir+Lndry+SwapUsed)

Just-RAM MaxObs figures for SMT (32 hw threads) vs not (16):
81066Mi vs. 69763Mi MaxObsActive
(Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.)
94493Mi vs. 94303Mi MaxObs(Act+Wir+Lndry)


===
Mark Millard
marklmi at yahoo.com