-mcpu= selections and the Windows Dev Kit 2023: example from-scratch buildkernel times (after kernel-toolchain)
Date: Sat, 13 May 2023 08:28:18 UTC
While the selections were guided by some benchmark like explorations, the results for the Windows Dev Kit 2023 (WDK23 abbreviation) go like: -mcpu=cortex-a72 code generation produced a (non-debug) kernel/world that, in turn, got (from scratch buildkernel after kernel-toolchain): Kernel(s) GENERIC-NODBG-CA72 built in 597 seconds, ncpu: 8, make -j8 (The rest of the aarch64 that I've access to is nearly-all cortex-a72 based, the others being cortex-a53 these days. So I was seeing how code tailored for the cortex-a72 context performed on the WDK23. cortex-a72 was my starting point with the WDK23.) -mcpu=cortex-x1c+flagm code generation produced a (non-debug) kernel/world that, in turn, got (from scratch buildkernel after kernel-toolchain): Kernel(s) GENERIC-NODBG-CA78C built in 584 seconds, ncpu: 8, make -j8 NOTE: "+flagm" is because of various clang/gcc having an inaccurate set of features that omit flagm --and I'm making sure I've got it enabled. -mcpu=cortex-a78c is even worse: it has examples of +fp16fml by default in some toolchains --but neither of the 2 types of core has support for such. (The cortex-x1c and cortex-a78c actually have matching features for code generation purposes, at least for all that I looked at. Toolchain mismatches for default features are sufficient evidence of an error in at least one case as far as I can tell.) This context is implicitly +lse+rcpc . At the time I was not being explicit when defaults matched. Notes: "lse" is the large system extension atomics, disabled below. "rcpc" is the extension having load acquire and store release instructions. (rcpc I was explicit about below, despite the default matching.) -mcpu=cortex-x1c+flagm+nolse+rcpc code generation produced a (non-debug) kernel/world that, in turn, got (from scratch buildkernel after kernel-toolchain): Kernel(s) GENERIC-NODBG-CA78CnoLSE built in 415 seconds, ncpu: 8, make -j Note: My explorations so far have tried the world combinations of lse and rcpc status but with a kernel that was based on -mcpu=cortex-x1c+flagm . I then updated the kernel to match the -mcpu=cortex-x1c+flagm+nolse+rcpc and used it to produce the above. So there is more exploring that I've not done yet. But I'm not expecting decreases to notably below the 415 sec. The benchmark like activity had showed that +lse+rcpc for the world/benchmark builds lead to notable negative consequences for cpus 0..3 compared to the other 3 combinations of status. For cpus 4..7, it showed that +nolse+rcpc for the world/benchmark builds had a noticeable gain compared to the other 3 combinations. This guided the buildkernel testing selections done so far. The buildkernel tests were, in part, to be sure that the apparent consequences were not just odd consequences for time measurements that could mess up benchmark result comparisons being useful. For comparison to a standard FreeBSD non-debug build, I used a snapshot download of: http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/13.2/FreeBSD-13.2-STABLE-arm64-aarch64-ROCK64-20230504-7dea7445ba44-255298.img.xz and dd'd it to media, replaced the EFI/*/* with ones that work for the Windows Dev Kit 2023, booted the WDK23 with the media, copied over my /usr/*-src/ to the media, did a "make -j8 kernel-toolchain", from the /usr/main-src/ copy and finally did a "make -j8 buildkernel" (so, from-scratch, given the toolchain materials are already in place): Kernel(s) GENERIC built in 505 seconds, ncpu: 8, make -j8 ( /usr/main-src/ has the source that the other buildkernel timings were based on. ) Looks like -mcpu=cortex-a72 and -mcpu=cortex-x1c+flagm are far from a good fit for buildkernel workloads to run under on the WDK23. FreeBSD defaults and -mcpu=cortex-x1c+flagm+nolse+rcpc seems to be better fits for such use. Note: This testing was in a ZFS context, using bectl to advantage, in case that somehow matters. For reference: # grep mcpu= /usr/main-src/sys/arm64/conf/GENERIC-NODBG-CA78C makeoptions CONF_CFLAGS="-mcpu=cortex-x1c+flagm+nolse+rcpc" # grep mcpu= ~/src.configs/*CA78C-nodbg* XCFLAGS+= -mcpu=cortex-x1c+flagm+nolse+rcpc XCXXFLAGS+= -mcpu=cortex-x1c+flagm+nolse+rcpc ACFLAGS.arm64cpuid.S+= -mcpu=cortex-x1c ACFLAGS.aesv8-armx.S+= -mcpu=cortex-x1c ACFLAGS.ghashv8-armx.S+= -mcpu=cortex-x1c # more /usr/local/etc/poudriere.d/main-CA78C-make.conf CFLAGS+= -mcpu=cortex-x1c+flagm+nolse+rcpc CXXFLAGS+= -mcpu=cortex-x1c+flagm+nolse+rcpc CPPFLAGS+= -mcpu=cortex-x1c+flagm+nolse+rcpc RUSTFLAGS_CPU_FEATURES= -C target-cpu=cortex-x1c -C target-feature=+x1c,+flagm,-lse,+rcpc diff --git a/secure/lib/libcrypto/Makefile b/secure/lib/libcrypto/Makefile index 8fde4f19d046..e13227d6450b 100644 --- a/secure/lib/libcrypto/Makefile +++ b/secure/lib/libcrypto/Makefile @@ -22,7 +22,7 @@ SRCS+= mem.c mem_dbg.c mem_sec.c o_dir.c o_fips.c o_fopen.c o_init.c SRCS+= o_str.c o_time.c threads_pthread.c uid.c .if defined(ASM_aarch64) SRCS+= arm64cpuid.S armcap.c -ACFLAGS.arm64cpuid.S= -march=armv8-a+crypto +ACFLAGS.arm64cpuid.S+= -march=armv8-a+crypto .elif defined(ASM_amd64) SRCS+= x86_64cpuid.S .elif defined(ASM_arm) @@ -43,7 +43,7 @@ SRCS+= mem_clr.c SRCS+= aes_cbc.c aes_cfb.c aes_ecb.c aes_ige.c aes_misc.c aes_ofb.c aes_wrap.c .if defined(ASM_aarch64) SRCS+= aes_core.c aesv8-armx.S vpaes-armv8.S -ACFLAGS.aesv8-armx.S= -march=armv8-a+crypto +ACFLAGS.aesv8-armx.S+= -march=armv8-a+crypto .elif defined(ASM_amd64) SRCS+= aes_core.c aesni-mb-x86_64.S aesni-sha1-x86_64.S aesni-sha256-x86_64.S SRCS+= aesni-x86_64.S vpaes-x86_64.S @@ -278,7 +278,7 @@ SRCS+= cbc128.c ccm128.c cfb128.c ctr128.c cts128.c gcm128.c ocb128.c SRCS+= ofb128.c wrap128.c xts128.c .if defined(ASM_aarch64) SRCS+= ghashv8-armx.S -ACFLAGS.ghashv8-armx.S= -march=armv8-a+crypto +ACFLAGS.ghashv8-armx.S+= -march=armv8-a+crypto === Mark Millard marklmi at yahoo.com