From nobody Sat May 13 08:50:15 2023 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QJK8l6LNXz4B6q1 for ; Sat, 13 May 2023 08:50:31 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic312-23.consmr.mail.gq1.yahoo.com (sonic312-23.consmr.mail.gq1.yahoo.com [98.137.69.204]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QJK8k4Jfcz3jNW for ; Sat, 13 May 2023 08:50:30 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=jpyXiiPW; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.69.204 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1683967829; bh=8eA/jZqKf9QIsDG63Y/20WFmKomYQ5T67/CS/Z7tEfM=; h=From:Subject:Date:References:To:In-Reply-To:From:Subject:Reply-To; b=jpyXiiPWM8BbORiVHNeZXgg9huzqeR8kquvJF1yefIJTVfMooFTPQOb4qaD7z8DKYO9TBCx97+BHKCn7iNSpyLFR8WB/NKO7+sLrced1na6zHgYRyMrreXpsVkH3z+RiE5Os13cV6mlAke3QP6uL935two9O6l6BthHC5ReNgX/CRVRfJQmumDTCghR21cJb3+RPaEgg/FSZ0AbEo8bV/ZKCIOkHkXr7s+EBSKbHHqgRvpukU8Wxc9ee3yVkLg7JmwAzqIGYs3/+gSLa6pIXB8yXTPogQutzluAHih8/x4uJwx4O1g+AkGw9fHVEtQapxqD0K9GmoUkUZ6C1TOUbtg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1683967829; bh=i2JrllOgmwJO2sXxlTUivjVIvDUeTjXwG+/us8oLMCp=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=eH3K0XTbWOGE/vdwwIUhR5sFc8goOSm04yPR5ISxZDgHlfkn0+JR+85cMlGUExkJSaDSPOakf2HQ1LPqyJbSI4ZXtp0wWcwMeftcn0ADKArWsEUBQgAZ/aCjqPMjRpCOaVioUQD8p4e1uYdUICDrH4jax0CNH1sB8vaIzFRa2/f6lfwzKkIUThdTfXF0nPmNgmakCsNDmRIFp2VQZf9kZ66KQjSur8X+CsAOFg5/16fq3R88JZL7xPrdimf9lMkr1vqY2/iZo66CV5hpvpItjgxQLdbGjvnx5/GuQiVqdClDayo6k+8t7UP2LVinOcbzys/Pk32xxvrrx8kUxrPFJA== X-YMail-OSG: .gQw558VM1maDeNVD0ZnMVpsrGLn.ITjlsl_NmaPs7GrF9pAlLe3ubE2Jn5kCX2 GDSVC3SZV8n02OX6NSdqIuoVxq4WtIpJ9.moO0uFFT4J1nk_ocUfNlpsJvpM6dc0QWyVk7kG3RrW 5OERWyrfVWaIinaJ94YiohqvP7o8Dd4Yn9S3Nip9KOAradwsUmGlxr0c2kKCgowIlTY.Mqc7Z0JA JhqsLfSk0DQIGofoZL6bLu.Zc5doPGaW3Jf8kYFfu7SegxghfCTF.oLPAg63.qcKAK6swz1Sm31S dbtezAuD01zC2b1OTu1rtJBb0BKHzesXFaXSzoIveCqTB5qNaUDulxNyonLwUZuh8U15HvurQ6_y fJMmF6k2YD8Zk8tytgTpqOpbuo6hHoATLyM6fEzzkOm3.Obtos4TXN5lW0WjQQLnAYlcKMaNDGh6 k_PRH9PMiTcykSDhUmJ0PtmlFoMRizmDZ7HQ79BthyVETuejeJxJGz9mQInuRazrDb5QnGsLT.d_ TydfFYYMVIxUdVK9rCyEvPUC6OXzvva1AZI6s.KiniJ6CmWY2ghcVXeCKwnUb3LsIH7MSLyypl28 SDZjDd5rniO.7MbasXnvua4zKndGFB5Bq68DAXJvTdmPRoqhqbUZlj1dL8zUMHBqzl7FVWdC9XfV YrYb9zWZRjwXlZTQglLhNda.UvwKEFLEqmdfjeG1NI9z7SnrC9vjP1VN48Wbb3W1plhtamd.WfQE blvqECxJCMsMgIb5uX6Dg10LLE57tPNZL3kUrC3zJppsY7i9UOJIaLk1RzlIgz4nTBcvPJlffTDc jaWC6ywvpsbyy.PY9M8i6ynbDX24EMvt3aiIfYTicCBA3ZTG_ApNVrKM5HKSyE9bReIQYtbygkyM m4mtVWxkZYx4tKI2T_hU9WwMvO.OhbooT8LSkeKU6umb7BWtSHBMovDu_ueorMsbROFHUDnBJq5l G8ChyO8Vnm8SqgbO7j8u2rZo9.M1AC1ZQ0U_LBJS2H2prEdMZY6l_6Yw59sF2EY0_cfkbzAniwk5 gZP4MF0ABlv_mMdPFu35XF54Z6veehrfd05xxhBx4iQFzhAFod70Wsk3FF56_mmkR3hHYSb6j3qr gPbzpRmyBL3vQPfjg1xZQ.z5aGo2tc6uIv2QnhMZOYHFg3AKf9bk0t16F.QF3gWV4KrGxR0Hi9p6 .qjEMAlSqiTCA8UIyrXMAQHGkEovMUoFGGrI.v2dt57BARGDIVTBJxqhZkvxvPElkl2aCKj8eVUk 5jV38b.QXtRu_z2msGp8sMYxENqwrBeWkC4VAbxU2JWK5QnPFueDBows9W2IKumR.OralpJ1sGc9 62KUQdk7mGtrsRIe94kPMvSX3AGiHMjrhNKDe963fC0WIjqTW6yrMkPH6OT58EL4mf0_sJ1DFOOD hcQEE6pMDaWdYxSrpYCNe.4iFK4zDWNUKWDDPGlhy9Y4_Bnn4XtO49xnNjDh.cYRwMQWBw8fJJ5. q07_EQd5pJHpOKnJN3rT065BTpmtkSWmHcdzd5stV6tBc3Tz95BkrQq4wut7.K06I8rc0p8S55hU QsQhcqGtToqPLB69tp57Bo3rHB0AW9fIt3gEwL3DVi5k5UnjtKRy6UKPh3SixBd5X_4KyBHT7KyG dTD3aQkXw.IA_VNdWtLBOaat_W0yI9hWVT7jX8lmv7ymP0LXI1fMo9zgCSclacY2sMpG4vPqdEDq TFZLKzrZRcPwllaqWIu.A29BbYNBpcuozlfnnz8pwp0rT5ahcI3VKCV4djX4RS.MTJQJ83Q7mjsu haPfl4HLP8WYZFKGf.0SwpRRkIQcH1B_fph8029_hoH7sal4SYHKaGSLjNdSgNniExMjQKKSt_Y1 .HZXE7T2D.9SrfOXiMA8snAY5YLyLPHqtrnwPAt5Oq7hGe404NtAFGjL9o3k6aZ4I7oObg046RNd xXud9MPYILE4f30f5_V_H6slet6hvpZEcSqATbBF8n6QBKWJvmJfO7jV5vwuE4PKFOpEiUTaeh6h yP9CFA04RTL1zDQGHb1hi.kgUbzEBi.pFM9U4R9jhLhZhbcxMaSuVbHBX0vHKO9ex7HKMwSlNNsV DFDa9lZJZqrZUNaN.aRKH55tzIKkGkzlXzW50ZIkZS4d3hJAz8itfvPECb36zSQx99igZms9QGFP BVA9ixEt6IYcWbVAJtjAubBKkDv6Krurov4UTjjNat0MW50kml0zOGojKmlX4I2lCs6DUZLMSjYm d6g5f3EetTnStRyZM8VfViAODkH_w4D_BMWQiEjsavBL.1LuT22nYq4bsNP9Q970H X-Sonic-MF: X-Sonic-ID: dddb7d07-64f6-4feb-aaf3-8227d9b11f7b Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.gq1.yahoo.com with HTTP; Sat, 13 May 2023 08:50:29 +0000 Received: by hermes--production-bf1-54475bbfff-xzdff (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 06e40dfe4d8af3f12cbd23f2601f9b79; Sat, 13 May 2023 08:50:27 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: -mcpu= selections and the Windows Dev Kit 2023: example from-scratch buildkernel times (after kernel-toolchain) Date: Sat, 13 May 2023 01:50:15 -0700 References: <3B5EB0DD-E9CB-41BD-9BCC-6549BBF0C0DA@yahoo.com> To: freebsd-arm In-Reply-To: <3B5EB0DD-E9CB-41BD-9BCC-6549BBF0C0DA@yahoo.com> Message-Id: <6196193E-4A75-464C-AB0B-AE2C3BC00D66@yahoo.com> X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Spamd-Result: default: False [-3.29 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.997]; NEURAL_HAM_SHORT(-0.80)[-0.796]; MV_CASE(0.50)[]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MIME_GOOD(-0.10)[text/plain]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; BLOCKLISTDE_FAIL(0.00)[98.137.69.204:server fail]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; RCVD_TLS_LAST(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.204:from]; TO_DN_ALL(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[yahoo.com]; MID_RHS_MATCH_FROM(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; MLMMJ_DEST(0.00)[freebsd-arm@freebsd.org] X-Rspamd-Queue-Id: 4QJK8k4Jfcz3jNW X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On May 13, 2023, at 01:28, Mark Millard wrote: > While the selections were guided by some benchmark like > explorations, the results for the Windows Dev Kit 2023 > (WDK23 abbreviation) go like: >=20 >=20 > -mcpu=3Dcortex-a72 code generation produced a (non-debug) > kernel/world that, in turn, got (from scratch buildkernel after > kernel-toolchain): >=20 > Kernel(s) GENERIC-NODBG-CA72 built in 597 seconds, ncpu: 8, make -j8 >=20 > (The rest of the aarch64 that I've access to is nearly-all cortex-a72 > based, the others being cortex-a53 these days. So I was seeing how > code tailored for the cortex-a72 context performed on the WDK23. > cortex-a72 was my starting point with the WDK23.) >=20 >=20 > -mcpu=3Dcortex-x1c+flagm code generation produced a (non-debug) > kernel/world that, in turn, got (from scratch buildkernel after > kernel-toolchain): >=20 > Kernel(s) GENERIC-NODBG-CA78C built in 584 seconds, ncpu: 8, make -j8 >=20 > NOTE: "+flagm" is because of various clang/gcc having an inaccurate > set of features that omit flagm --and I'm making sure I've got it > enabled. -mcpu=3Dcortex-a78c is even worse: it has examples of = +fp16fml > by default in some toolchains --but neither of the 2 types of core has > support for such. (The cortex-x1c and cortex-a78c actually have = matching > features for code generation purposes, at least for all that I looked > at. Toolchain mismatches for default features are sufficient evidence > of an error in at least one case as far as I can tell.) >=20 > This context is implicitly +lse+rcpc . At the time I was not being > explicit when defaults matched. >=20 > Notes: > "lse" is the large system extension atomics, disabled below. > "rcpc" is the extension having load acquire and store release > instructions. (rcpc I was explicit about below, despite the > default matching.) >=20 >=20 > -mcpu=3Dcortex-x1c+flagm+nolse+rcpc code generation produced a > (non-debug) kernel/world that, in turn, got (from scratch buildkernel > after kernel-toolchain): >=20 > Kernel(s) GENERIC-NODBG-CA78CnoLSE built in 415 seconds, ncpu: 8, = make -j >=20 > Note: My explorations so far have tried the world combinations of > lse and rcpc status but with a kernel that was based on > -mcpu=3Dcortex-x1c+flagm . I then updated the kernel to match the > -mcpu=3Dcortex-x1c+flagm+nolse+rcpc and used it to produce the above. > So there is more exploring that I've not done yet. But I'm not > expecting decreases to notably below the 415 sec. >=20 > The benchmark like activity had showed that +lse+rcpc for the > world/benchmark builds lead to notable negative consequences for > cpus 0..3 compared to the other 3 combinations of status. For > cpus 4..7, it showed that +nolse+rcpc for the world/benchmark > builds had a noticeable gain compared to the other 3 combinations. > This guided the buildkernel testing selections done so far. The > buildkernel tests were, in part, to be sure that the apparent > consequences were not just odd consequences for time measurements > that could mess up benchmark result comparisons being useful. >=20 >=20 > For comparison to a standard FreeBSD non-debug build, I used a > snapshot download of: >=20 > = http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/13.2/FreeBSD-13.2= -STABLE-arm64-aarch64-ROCK64-20230504-7dea7445ba44-255298.img.xz >=20 > and dd'd it to media, replaced the EFI/*/* with ones that > work for the Windows Dev Kit 2023, booted the WDK23 with the media, > copied over my /usr/*-src/ to the media, did a "make -j8 = kernel-toolchain", > from the /usr/main-src/ copy and finally did a "make -j8 buildkernel" > (so, from-scratch, given the toolchain materials are already in = place): >=20 > Kernel(s) GENERIC built in 505 seconds, ncpu: 8, make -j8 >=20 > ( /usr/main-src/ has the source that the other buildkernel timings > were based on. ) >=20 >=20 > Looks like -mcpu=3Dcortex-a72 and -mcpu=3Dcortex-x1c+flagm are far = from > a good fit for buildkernel workloads to run under on the WDK23. = FreeBSD > defaults and -mcpu=3Dcortex-x1c+flagm+nolse+rcpc seems to be better = fits > for such use. >=20 >=20 > Note: This testing was in a ZFS context, using bectl to advantage, in > case that somehow matters. >=20 >=20 > For reference: >=20 > # grep mcpu=3D /usr/main-src/sys/arm64/conf/GENERIC-NODBG-CA78C > makeoptions CONF_CFLAGS=3D"-mcpu=3Dcortex-x1c+flagm+nolse+rcpc" >=20 > # grep mcpu=3D ~/src.configs/*CA78C-nodbg* > XCFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc > XCXXFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc > ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-x1c > ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-x1c > ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-x1c >=20 > # more /usr/local/etc/poudriere.d/main-CA78C-make.conf > CFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc > CXXFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc > CPPFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc > RUSTFLAGS_CPU_FEATURES=3D -C target-cpu=3Dcortex-x1c -C = target-feature=3D+x1c,+flagm,-lse,+rcpc Note: RUSTFLAGS_CPU_FEATURES is something that I added to my environment to allow the experiment: # git -C /usr/ports/ diff Mk/Uses/cargo.mk diff --git a/Mk/Uses/cargo.mk b/Mk/Uses/cargo.mk index 50146372fee1..2f21453fd02b 100644 --- a/Mk/Uses/cargo.mk +++ b/Mk/Uses/cargo.mk @@ -145,7 +145,9 @@ WITH_LTO=3D yes . endif # Adjust -C target-cpu if -march/-mcpu is set by bsd.cpu.mk -. if ${ARCH} =3D=3D amd64 || ${ARCH} =3D=3D i386 +. if defined(RUSTFLAGS_CPU_FEATURES) +RUSTFLAGS+=3D ${RUSTFLAGS_CPU_FEATURES} +. elif ${ARCH} =3D=3D amd64 || ${ARCH} =3D=3D i386 RUSTFLAGS+=3D ${CFLAGS:M-march=3D*:S/-march=3D/-C target-cpu=3D/} . elif ${ARCH:Mpowerpc*} RUSTFLAGS+=3D ${CFLAGS:M-mcpu=3D*:S/-mcpu=3D/-C = target-cpu=3D/:S/power/pwr/} > diff --git a/secure/lib/libcrypto/Makefile = b/secure/lib/libcrypto/Makefile > index 8fde4f19d046..e13227d6450b 100644 > --- a/secure/lib/libcrypto/Makefile > +++ b/secure/lib/libcrypto/Makefile > @@ -22,7 +22,7 @@ SRCS+=3D mem.c mem_dbg.c mem_sec.c o_dir.c = o_fips.c o_fopen.c o_init.c > SRCS+=3D o_str.c o_time.c threads_pthread.c uid.c > .if defined(ASM_aarch64) > SRCS+=3D arm64cpuid.S armcap.c > -ACFLAGS.arm64cpuid.S=3D -march=3Darmv8-a+crypto > +ACFLAGS.arm64cpuid.S+=3D -march=3Darmv8-a+crypto > .elif defined(ASM_amd64) > SRCS+=3D x86_64cpuid.S > .elif defined(ASM_arm) > @@ -43,7 +43,7 @@ SRCS+=3D mem_clr.c > SRCS+=3D aes_cbc.c aes_cfb.c aes_ecb.c aes_ige.c aes_misc.c aes_ofb.c = aes_wrap.c > .if defined(ASM_aarch64) > SRCS+=3D aes_core.c aesv8-armx.S vpaes-armv8.S > -ACFLAGS.aesv8-armx.S=3D -march=3Darmv8-a+crypto > +ACFLAGS.aesv8-armx.S+=3D -march=3Darmv8-a+crypto > .elif defined(ASM_amd64) > SRCS+=3D aes_core.c aesni-mb-x86_64.S aesni-sha1-x86_64.S = aesni-sha256-x86_64.S > SRCS+=3D aesni-x86_64.S vpaes-x86_64.S > @@ -278,7 +278,7 @@ SRCS+=3D cbc128.c ccm128.c cfb128.c ctr128.c = cts128.c gcm128.c ocb128.c > SRCS+=3D ofb128.c wrap128.c xts128.c > .if defined(ASM_aarch64) > SRCS+=3D ghashv8-armx.S > -ACFLAGS.ghashv8-armx.S=3D -march=3Darmv8-a+crypto > +ACFLAGS.ghashv8-armx.S+=3D -march=3Darmv8-a+crypto =3D=3D=3D Mark Millard marklmi at yahoo.com