Re: HoneyComb first-boot notes [a L3/L2/L1/RAM performance oddity]
Date: Mon, 12 Jul 2021 01:29:51 UTC
On 2021-Jul-11, at 04:03, Mark Millard <marklmi at yahoo.com> wrote: > On 2021-Jul-10, at 22:09, Mark Millard <marklmi at yahoo.com> wrote: > >> On 2021-Jun-24, at 16:25, Mark Millard <marklmi at yahoo.com> wrote: >> >>> On 2021-Jun-24, at 16:00, Mark Millard <marklmi at yahoo.com> wrote: >>> >>>> On 2021-Jun-24, at 13:39, Mark Millard <marklmi at yahoo.com> wrote: >>>> >>>>> Repeating here what I've reported on teh solidrun discord: >>>>> >>>>> I decided to experiment with monitoring the temperatures reported >>>>> as things are. For the default heat-sink/fan and the 2 other fans >>>>> in the case, buildworld with load average 16.? for some time has >>>>> stayed with tz0 through tz6 reporting between 61.0degC and 66.0degC, >>>>> say about 20degC for ambiant. (tz7 and tz8 report 0.1C.) During >>>>> stages with lower load averages, the tz0..tz6 tempuratures back off >>>>> some. So it looks like my default context keeps the system >>>>> sufficiently cool for such use. >>>>> >>>>> I'll note that the default heat-sink's fan is not operating at rates >>>>> that I hear it upstairs. I've heard the noisy mode from there during >>>>> early parts of booting for Fedora 34 server, for example. >>>> >>>> So I updated my stable/13 source and built and installed >>>> the update, then did a rm -fr of the build directory >>>> tree context and started a from-scratch build. The >>>> build had: >>>> >>>> SYSTEM_COMPILER: Determined that CC=cc matches the source tree. Not bootstrapping a cross-compiler. >>>> and: >>>> SYSTEM_LINKER: Determined that LD=ld matches the source tree. Not bootstrapping a cross-linker. >>>> >>>> as is my standard context for doing such "how long does >>>> it take" buildworld buildkernel testing. >>>> >>>> On aarch64 I do not build for targeting non-arm architectures. >>>> This does save some time on the builds. >>> >>> I should have mentioned that my builds are based on tuning >>> for the cortex-a72 via -mcpu=cortex-a72 being used. This >>> was also true of the live system that was running, kernel >>> and world. >>> >>>> The results for the HoneyComb configuration I'm using: >>>> >>>> World build completed on Thu Jun 24 15:30:11 PDT 2021 >>>> World built in 3173 seconds, ncpu: 16, make -j16 >>>> Kernel build for GENERIC-NODBG-CA72 completed on Thu Jun 24 15:34:45 PDT 2021 >>>> Kernel(s) GENERIC-NODBG-CA72 built in 274 seconds, ncpu: 16, make -j16 >>>> >>>> So World+Kernel took a a little under 1 hr to build (-j16). >>>> >>>> >>>> >>>> Comparison/contrast to prior aarch64 systems that I've used >>>> for buildworld buildkernel . . . >>>> >>>> >>>> By contrast, the (now failed) OverDrive 1000's last timing >>>> was (building releng/13 instead of stable/13): >>>> >>>> World build completed on Tue Apr 27 02:50:52 PDT 2021 >>>> World built in 12402 seconds, ncpu: 4, make -j4 >>>> Kernel build for GENERIC-NODBG-CA72 completed on Tue Apr 27 03:08:04 PDT 2021 >>>> Kernel(s) GENERIC-NODBG-CA72 built in 1033 seconds, ncpu: 4, make -j4 >>>> >>>> So World+Kernel took a a little under 3.75 hrs to build (-j4). >>>> >>>> >>>> The MACCHIATObin Double Shot's last timing was >>>> (building a 13-CURRENT): >>>> >>>> World build completed on Tue Jan 19 03:44:59 PST 2021 >>>> World built in 14902 seconds, ncpu: 4, make -j4 >>>> Kernel build for GENERIC-NODBG completed on Tue Jan 19 04:04:25 PST 2021 >>>> Kernel(s) GENERIC-NODBG built in 1166 seconds, ncpu: 4, make -j4 >>>> >>>> So World+Kernel took a little under 4.5 hrs to build (-j4). >>>> >>>> >>>> The RPi4B 8GiByte's last timing was >>>> ( arm_freq=2000, sdram_freq_min=3200, force_turbo=1, USB3 SSD >>>> building releng/13 ): >>>> >>>> World build completed on Tue Apr 20 14:34:38 PDT 2021 >>>> World built in 22104 seconds, ncpu: 4, make -j4 >>>> Kernel build for GENERIC-NODBG completed on Tue Apr 20 15:03:24 PDT 2021 >>>> Kernel(s) GENERIC-NODBG built in 1726 seconds, ncpu: 4, make -j4 >>>> >>>> So World+Kernel took somewhat under 6 hrs 40 min to build. >>> >>> The -mcpu=cortex-a72 use note also applies to the OverDrive 1000, >>> MACCHIATObin Double Shot, and RPi4B 8 GiByte contexts. >>> >> >> I've run into an issue where what FreeBSD calls cpu 0 has >> significantly different L3/L2/L1/RAM subsystem performance >> than all the other cores (cpu 0 being worse). Similarly for >> compared/contrasted to all 4 MACCHIATObin Double Shot cores. >> >> A plot with curves showing the issue is at: >> >> https://github.com/markmi/acpphint/blob/master/acpphint_example_data/HoneyCombFreeBSDcpu0RAMAccessPerformanceIsOdd.png >> >> The dark red curves in the plot show the expected general >> shape for such and are for cpu 0. The lighter colored >> curves are the MACCHIATObin curves. The darker ones are >> the HoneyComb curves, where the L3/L2/L1 is relatively >> effective (other than cpu 0). >> >> My notes on Discord (so far) are . . . >> >> The curves are from my C++ variant of the old Hierarchical >> INTegration benchmark (historically abbreviated HINT). You >> can read the approximate size of a level of cache from >> the x-axis for where the curve drops faster. So, right >> (most obvious) to left (least obvious): L3 8 MiByte, L2 1 >> MiByte (per core pair, as it turns out), L1 32 KiByte. >> >> The curves here are for single thread benchmark >> configurations with cpuset used to control which CPU is >> used. I first noticed this via odd performance variations >> in multithreading with more cores allowed than in use (so >> migrations to a variety of cpus over time). >> >> I explored all the CPUs (cores), not just what I plotted. >> Only the one gets the odd performing memory access >> structure in its curve. >> >> FYI: The FreeBSD boot is UEFI/ACPI based for both systems, >> not U-Boot based. >> > > Jon Nettleton has replicated the memory access performance > issue on the one cpu via a different HoneyComb, running > some Linux kernel, using tinymembench as the benchmark. > Jon reports that for HoneyCombs older and newer, EDK2's older and newer: All show the behavior on cpu 0. "[I]t may have always existed." Jon also reports that U-Boot based booting does not get the behavior. (I've never used U-Boot to boot the HoneyComb for any OS media that I've got around. In my U-Boot ignorance, my quick attempts failed for FreeBSD main and Fedora 34 Server media that I've been using with EDK2's UEFI/ACPI.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)