Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)
Date: Fri, 10 Nov 2023 01:26:57 UTC
Reading some benchmark results for compilation activity that showed some SMT vs. not examples and also using my C++ variant of the old HINT benchmark, I ended up curious how a non-SMT from scratch bulk -a would end up (ZFS context) compared my prior SMT based run. I use a high load average style of bulk -a activity that has USE_TMPFS=all involved. The system has 96 GiBytes of RAM (total across the 2 DIMMs). The original under 1.5 day time definitely had significant swap space use (RAM+SWAP = 96 GiBYtes + 364 GiBytes == 460 GiBytes == 471040 MiBytes). The media was (and is) a PCIe based Optane 905P 1.5T. ZFS on a single partition on the single drive, ZFS used just for bectl reasons, not other typical use-ZFS reasons. I've not controlled the ARC size-range explicitly. So less swap partition use is part of contribution to the results. The original bulk -a spent a couple of hours at the end where it was just fetching and building textproc/stardict-quick . I have not cleared out /usr/ports/distfiles or updated anything. So fetch time is also a difference here. SMT (32 hardware threads, original bulk -a): [33:10:00] [32] [04:37:23] Finished emulators/libretro-mame | libretro-mame-20220124_1: Success [35:36:51] [23] [03:44:04] Finished textproc/stardict-quick | stardict-quick-2.4.2_9: Success . . . [main-amd64-bulk_a-default] [2023-11-01_07h14m50s] [committing:] Queued: 34683 Built: 33826 Failed: 179 Skipped: 358 Ignored: 320 Fetched: 0 Tobuild: 0 Time: 35:37:55 Swap-involved MaxObs (Max Observed) figures: 173310Mi MaxObsUsed 256332Mi MaxObs(Act+Lndry+SwapUsed) 265551Mi MaxObs(Act+Wir+Lndry+SwapUsed) (So 265551Mi of 471040Mi RAM+SWAP.) Just-RAM MaxObs figures: 81066Mi MaxObsActive (Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.) 94493Mi MaxObs(Act+Wir+Lndry) Note: MaxObs(A+B+C) <= MaxObs(A)+MaxObs(B)+MaxObs(C) ALLOW_MAKE_JOBS=yes was used. No explicit restriction on PARALLEL_JOBS or MAKE_JOBS_NUMBER (or analogous). So 32 builders allowed, each allowed 32 make jobs. This explains the high load averages of the bulk -a : load averages . . . MaxObs: 360.70, 267.63, 210.84 (Those need not be all from the same time frame during the bulk -a .) As for the ports vintage: # ~/fbsd-based-on-what-commit.sh -C /usr/ports/ 6ec8e3450b29 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/sdts++: Mark DEPRECATED Author: Muhammad Moinur Rahman <bofh@FreeBSD.org> Commit: Muhammad Moinur Rahman <bofh@FreeBSD.org> CommitDate: 2023-10-21 19:01:38 +0000 branch: main merge-base: 6ec8e3450b29462a590d09fb0b07ed214d456bd5 merge-base: CommitDate: 2023-10-21 19:01:38 +0000 n637598 (--first-parent --count for merge-base) I do have a environment that avoids various LLVM builds taking as long to build : llvm1[3-7] : no MLIR, no FLANG llvm1[4-7] : use BE_NATIVE other llvm* : use defaults (so, no avoidance) I also prevent the builds from using strip on most of the install materials built (not just toolchain materials). non-SMT (16 hardware threads): Note one builder (math/fricas), the last still present, was stuck and I had to kill processes to have it stop unless I was willing to wiat for my large timeout figures. The last builder normal-finish was: [39:48:10] [09] [00:16:23] Finished devel/gcc-msp430-ti-toolchain | gcc-msp430-ti-toolchain-9.3.1.2.20210722_1: Success So, trying to place some bounds for comparing to SMT (32 hw threads) and non-SMT (16 hw threads): 33:10:00 SMT -> 39:48:10 non-SMT would be over 6.5 hrs longer for non-SMT 35:36:51 SMT -> 39:48:10 non-SMT would be over 4 hrs longer for non-SMT As for SMT vs. non-SMT Maximum Observed figures: SMT load averages . . . MaxObs: 360.70, 267.63, 210.84 non-SMT load averages . . . MaxObs: 152.89, 100.94, 76.28 Swap-involved MaxObs figures for SMT (32 hw threads) vs not (16): 173310Mi vs. 33003Mi MaxObsUsed 256332Mi vs. 117221Mi MaxObs(Act+Lndry+SwapUsed) 265551Mi vs. 124776Mi MaxObs(Act+Wir+Lndry+SwapUsed) Just-RAM MaxObs figures for SMT (32 hw threads) vs not (16): 81066Mi vs. 69763Mi MaxObsActive (Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.) 94493Mi vs. 94303Mi MaxObs(Act+Wir+Lndry) === Mark Millard marklmi at yahoo.com