Re: HoneyComb first-boot notes [a L3/L2/L1/RAM performance oddity: fix identified]
- In reply to: Mark Millard via freebsd-arm : "Re: HoneyComb first-boot notes [a L3/L2/L1/RAM performance oddity]"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 15 Jul 2021 20:48:11 UTC
On 2021-Jul-11, at 18:29, Mark Millard <marklmi at yahoo.com> wrote: >>>> . . . >>> >>> I've run into an issue where what FreeBSD calls cpu 0 has >>> significantly different L3/L2/L1/RAM subsystem performance >>> than all the other cores (cpu 0 being worse). Similarly for >>> compared/contrasted to all 4 MACCHIATObin Double Shot cores. >>> >>> A plot with curves showing the issue is at: >>> >>> https://github.com/markmi/acpphint/blob/master/acpphint_example_data/HoneyCombFreeBSDcpu0RAMAccessPerformanceIsOdd.png >>> >>> The dark red curves in the plot show the expected general >>> shape for such and are for cpu 0. The lighter colored >>> curves are the MACCHIATObin curves. The darker ones are >>> the HoneyComb curves, where the L3/L2/L1 is relatively >>> effective (other than cpu 0). >>> >>> My notes on Discord (so far) are . . . >>> >>> The curves are from my C++ variant of the old Hierarchical >>> INTegration benchmark (historically abbreviated HINT). You >>> can read the approximate size of a level of cache from >>> the x-axis for where the curve drops faster. So, right >>> (most obvious) to left (least obvious): L3 8 MiByte, L2 1 >>> MiByte (per core pair, as it turns out), L1 32 KiByte. >>> >>> The curves here are for single thread benchmark >>> configurations with cpuset used to control which CPU is >>> used. I first noticed this via odd performance variations >>> in multithreading with more cores allowed than in use (so >>> migrations to a variety of cpus over time). >>> >>> I explored all the CPUs (cores), not just what I plotted. >>> Only the one gets the odd performing memory access >>> structure in its curve. >>> >>> FYI: The FreeBSD boot is UEFI/ACPI based for both systems, >>> not U-Boot based. >>> >> >> Jon Nettleton has replicated the memory access performance >> issue on the one cpu via a different HoneyComb, running >> some Linux kernel, using tinymembench as the benchmark. >> > > Jon reports that for HoneyCombs older and newer, EDK2's older > and newer: All show the behavior on cpu 0. "[I]t may have > always existed." > > Jon also reports that U-Boot based booting does not get the > behavior. > > (I've never used U-Boot to boot the HoneyComb for any OS > media that I've got around. In my U-Boot ignorance, my > quick attempts failed for FreeBSD main and Fedora 34 > Server media that I've been using with EDK2's UEFI/ACPI.) The problem in the: lx2160a_uefi/build/arm-trusted-firmware/plat/nxp/soc-lx2160a/soc.c code has been identified and my testing of the proposed fix indicates things are working. Some very early code setting up the L1 Data prefetch configuration was depending on not-well-initialized memory and an initialization routine needed to be used a little earlier in the sequencing to avoid that. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)