Re: Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)
Date: Thu, 16 Nov 2023 04:50:21 UTC
On Nov 12, 2023, at 18:00, Mark Millard <marklmi@yahoo.com> wrote: > On Nov 9, 2023, at 17:26, Mark Millard <marklmi@yahoo.com> wrote: > >> Reading some benchmark results for compilation activity that showed some >> SMT vs. not examples and also using my C++ variant of the old HINT >> benchmark, I ended up curious how a non-SMT from scratch bulk -a would >> end up (ZFS context) compared my prior SMT based run. >> >> I use a high load average style of bulk -a activity that has USE_TMPFS=all >> involved. The system has 96 GiBytes of RAM (total across the 2 DIMMs). >> The original under 1.5 day time definitely had significant swap space use >> (RAM+SWAP = 96 GiBYtes + 364 GiBytes == 460 GiBytes == 471040 MiBytes). >> The media was (and is) a PCIe based Optane 905P 1.5T. ZFS on a single >> partition on the single drive, ZFS used just for bectl reasons, not other >> typical use-ZFS reasons. I've not controlled the ARC size-range explicitly. >> >> So less swap partition use is part of contribution to the results. >> >> The original bulk -a spent a couple of hours at the end where it was >> just fetching and building textproc/stardict-quick . I have not cleared >> out /usr/ports/distfiles or updated anything. >> >> So fetch time is also a difference here. >> >> SMT (32 hardware threads, original bulk -a): >> >> [33:10:00] [32] [04:37:23] Finished emulators/libretro-mame | libretro-mame-20220124_1: Success >> [35:36:51] [23] [03:44:04] Finished textproc/stardict-quick | stardict-quick-2.4.2_9: Success >> . . . >> [main-amd64-bulk_a-default] [2023-11-01_07h14m50s] [committing:] Queued: 34683 Built: 33826 Failed: 179 Skipped: 358 Ignored: 320 Fetched: 0 Tobuild: 0 Time: 35:37:55 >> >> Swap-involved MaxObs (Max Observed) figures: >> 173310Mi MaxObsUsed >> 256332Mi MaxObs(Act+Lndry+SwapUsed) >> 265551Mi MaxObs(Act+Wir+Lndry+SwapUsed) >> (So 265551Mi of 471040Mi RAM+SWAP.) >> >> Just-RAM MaxObs figures: >> 81066Mi MaxObsActive >> (Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.) >> 94493Mi MaxObs(Act+Wir+Lndry) >> >> Note: MaxObs(A+B+C) <= MaxObs(A)+MaxObs(B)+MaxObs(C) >> >> ALLOW_MAKE_JOBS=yes was used. No explicit restriction on PARALLEL_JOBS >> or MAKE_JOBS_NUMBER (or analogous). So 32 builders allowed, each allowed >> 32 make jobs. This explains the high load averages of the bulk -a : >> >> load averages . . . MaxObs: 360.70, 267.63, 210.84 >> (Those need not be all from the same time frame during the bulk -a .) >> >> As for the ports vintage: >> >> # ~/fbsd-based-on-what-commit.sh -C /usr/ports/ >> 6ec8e3450b29 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/sdts++: Mark DEPRECATED >> Author: Muhammad Moinur Rahman <bofh@FreeBSD.org> >> Commit: Muhammad Moinur Rahman <bofh@FreeBSD.org> >> CommitDate: 2023-10-21 19:01:38 +0000 >> branch: main >> merge-base: 6ec8e3450b29462a590d09fb0b07ed214d456bd5 >> merge-base: CommitDate: 2023-10-21 19:01:38 +0000 >> n637598 (--first-parent --count for merge-base) >> >> I do have a environment that avoids various LLVM builds taking >> as long to build : >> >> llvm1[3-7] : no MLIR, no FLANG >> llvm1[4-7] : use BE_NATIVE >> other llvm* : use defaults (so, no avoidance) >> >> I also prevent the builds from using strip on most of the install >> materials built (not just toolchain materials). >> >> >> non-SMT (16 hardware threads): >> >> Note one builder (math/fricas), the last still present, was >> stuck and I had to kill processes to have it stop unless I >> was willing to wiat for my large timeout figures. The last >> builder normal-finish was: >> >> [39:48:10] [09] [00:16:23] Finished devel/gcc-msp430-ti-toolchain | gcc-msp430-ti-toolchain-9.3.1.2.20210722_1: Success >> >> So, trying to place some bounds for comparing to SMT (32 hw threads) >> and non-SMT (16 hw threads): >> >> 33:10:00 SMT -> 39:48:10 non-SMT would be over 6.5 hrs longer for non-SMT >> 35:36:51 SMT -> 39:48:10 non-SMT would be over 4 hrs longer for non-SMT >> >> As for SMT vs. non-SMT Maximum Observed figures: >> >> SMT load averages . . . MaxObs: 360.70, 267.63, 210.84 >> non-SMT load averages . . . MaxObs: 152.89, 100.94, 76.28 >> >> Swap-involved MaxObs figures for SMT (32 hw threads) vs not (16): >> 173310Mi vs. 33003Mi MaxObsUsed >> 256332Mi vs. 117221Mi MaxObs(Act+Lndry+SwapUsed) >> 265551Mi vs. 124776Mi MaxObs(Act+Wir+Lndry+SwapUsed) >> >> Just-RAM MaxObs figures for SMT (32 hw threads) vs not (16): >> 81066Mi vs. 69763Mi MaxObsActive >> (Given the complications of getting usefully comparable wired figures for ZFS (ARC): omit.) >> 94493Mi vs. 94303Mi MaxObs(Act+Wir+Lndry) >> > > I've added a section for a plot for the 7950X3D to the end of: > > https://github.com/markmi/acpphint/blob/master/Some_acpphint_curves_with_notes.md > > It is from a C++ variant of the old HINT benchmark and includes > showing RAM caching consequences for the benchmark. The about > 32 MiByte and about 96 MiByte cache sizes for the 2 CCDs are > observable. > > I'll also note that for the devices present (active and not), > at fully active the 7950X3D seems to use 225 Watts .. 235 Watts > at the power cable for FreeBSD. Idle FreeBSD: more like 96 > Watts. > > (No video card. 2 forms of Optane 905P 1.5TB, one active. One > Samsung 960 Pro 2TB, inactive. One Samsung 970 EVO Plus 2TB, > inactive. 96 GiBytes of RAM total across 2 DIMMs. Fans and > AIO cooling. Keyboard and mouse USB powered. USB3 Ethernet > dongle. Monitor connection.) > > > ThreadRipper 1950X "bulk -a" test in progress: > > I'm running a from-scratch USE_TMPFS=all "bulk -a" on the > ThreadRipper 1950X (128 GiBytes of RAM). From what I've seen > so far, it looks to likely take over 72 hr, so 2x+ as long > as the 7950X3D. (Samgsung 960 Pro 1TB system media and > Optane 900 480 GB swap space media in use, 447 GiByte I as I > remember). The ZFS partition on the 960 Pro has ashift=14 .) > It has a slightly modified copy of the ZFS from the 7950X3D > as far as starting content goes. It does have openzfs-2.2 > compatibility fully enabled for its pool, including block > cloning, unlike any other ZFS I have around > (openzfs-2.1-freebsd). ThreadRipper 1950X: . . . [85:21:50] [27] [02:06:01] Finished databases/mongodb60 | mongodb60-6.0.11: Success [85:34:00] [28] [03:23:06] Finished biology/ncbi-cxx-toolkit | ncbi-cxx-toolkit-27.0.0_1: Success [85:46:31] [30] [08:19:30] Finished cad/kicad-library-packages3d | kicad-library-packages3d-7.0.2_2: Success [87:07:02] [03] [13:00:45] Finished emulators/libretro-mame | libretro-mame-20220124_1: Success But one port that normally takes little time got stuck (in kqread, apparently against a <defunct> child process), resulting in (later): # poudriere status -b [main-amd64-bulk_a-default] [2023-11-11_17h59m25s] [parallel_build:] Queued: 34683 Built: 33807 Failed: 173 Skipped: 382 Ignored: 320 Fetched: 0 Tobuild: 1 Time: 88:17:59 ID TOTAL ORIGIN PKGNAME PHASE PHASE TMPFS CPU% MEM% [05] 17:27:25 ftp/curlie | curlie-1.6.7_15 check-sanity 17:27:15 1.28 GiB =>> Logs: /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-11-11_17h59m25s So it looks like: Ryzen 9 7950X 96 GiBytes RAM (5600MT/s): 33 hr or so. ThreadRipper 1950X 128 GiBytes RAM (2400MT/s): 87 hr or so. For reference (both 32 hardware threads): Ryzen 9 7950X: 265551Mi MaxObs(Act+Wir+Lndry+SwapUsed) ThreadRipper 1950X: 245564Mi MaxObs(Act+Wir+Lndry+SwapUsed) (The 96 GiByte vs. 128 GiByte RAM size difference makes other figures messier to compare.) I have updated the 7950X UEFI and am rerunning the from-scratch bulk -a test in the ZFS context to check on system stability for such. === Mark Millard marklmi at yahoo.com