From nobody Thu Nov 25 00:13:11 2021 X-Original-To: arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 643361898CA6 for ; Thu, 25 Nov 2021 00:13:20 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-21.consmr.mail.gq1.yahoo.com (sonic305-21.consmr.mail.gq1.yahoo.com [98.137.64.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HzyyR1H7tz4cFv for ; Thu, 25 Nov 2021 00:13:19 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1637799198; bh=y3QmrXxri32nhf98decydJd8FPXrT/nYSNp6VXvfU6c=; h=From:Subject:Date:References:To:In-Reply-To:From:Subject:Reply-To; b=EN7f+660+vcP+4eKpKpL1cXYHfIPjIu/o5PJtwcJFxZZfDQr/4xeFOZAC84136zgtQhFzIsK5eOAZi8wm7hqI1GtQOvYS+rxv59o9wlhpgfExgNxvLkgI8ap+OavqsIRAuEaTfGSh087KaGtH80a9dpR6q+NjxO+pdAQTFKhnA7umxTPvCBYiVArrBmjU2SOb+w6HZ+5lLldBwV4F62szYV9k4PCL2Vob4FdutOZEf+lYHmokxo04OxH0nTfEpf/8IVgq+eoFaZ6XJACDzEqtPcgSmhXv4fXgNGUZdwl4jpmdcgD3HBz2cmv+v5qYr1JmkxgYm+toIgGPppobo/7ig== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1637799198; bh=RFOMaVT47VjxH6Vggpoamj5Cybs6mLnVj89/7XdEq5E=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=CcmQnUu4PYOoIF1rrHI2NOHe9peVCR4ArUpwpxOCcl5nlZ+2bLyLxhZX78fwVPgZRmpjaY21bofzgaz0c44BWirOpLiC2fxXFd6e540aoh6Qh2A8UkgEOiX8C1XjlHkMZMVOyBBIRgajnjJr8iyKfYb9+8rdky4iIPHhyJFsH1uK1tChQ2VlHpDVYf1DIWTheFQQ6pe5NEs+rGOEFAK/F9TJC5hQktUM78+fVQwzVzdPyKruWc0dKsfqc/QIcCo11DT+rukpsTq7/LN6Olr5y4rtSOqO6lUewhuPx0antQfvHixBGnqTBUA61TtOoAmveAFPG8bxPt/Gt8uZEXw+lQ== X-YMail-OSG: 6BDl608VM1nm.O6RGO.GNbBy1BhklHwKILeEg3JlFQpXu6wHk1ZH9s_iB7v4WWi ESe3Dllnen1VtHm_LQEuOs9xKd_vW6to_rGAVi3RyBwnRZKzz.CndnyIDcP1HtB3J8Vdc9SkY_te 2sQmJPhYs3AU5a9LiSUM2_PP2ouHBOhklItdbv4NlAOo0tD9ebtglI2Q4XO_s.gRFPNGWUgtWm2x .ZAF3y1Bh1z85PcbIGF6Kr_OdBRMEWsEZvV1s1nxOgrHVoci1KXDWTV.rqHtW7tZY1BSjAkHddJA oRqpjlnQdoXIY8MstwMhmMD68OkwsCMu6_ga5odEx34kz0p5ZerugFwWhRWGFbczBUXmFLUo2vrM 7gRjU3c2LFfK1JIJafRJ3dBhaHy3COCFGuZ2JPWzdHSDa.9pSpTZytdiihHOgZZ_M3CWHpQchay0 smaK6VeNCyPzUO_pAVE9L0FFY14sBD99rvSX44Do2.JcE7d2xyB9JT97gw.vH36GV96C5G0ZZLGv 57A5QDT7Xz9igL8h7tihhWaM_PaxAl21jLZ_HDqMAUr8oMwg6NLvc78yV.xW0G5LCgfi1gi9rM6H sDbyBmeLRk9rv.zTZWDRR7fTBOpeagwQ7kPbtGtKNangzU88He8yKjpQ7_hqDodoccVNjhoKmNC2 y1LPjQt2STU5HU8HrvwV5f9o1xts8gsEc.GwUPEUuD48w1Uel7giHNQpNNlMMe1W4qQixlHM0DcH _hW_o3VbLcau3Q3gJZs74rhla52WCxHv8HnwQvybRC74Doxyw2_zJBfdfHGt3o27rxNc0_rfVJXL J5w7vszm4nG3Cvv6qez_QyM6NcR8Xl657o3ITRchAatxxoh9pADugWo..IIx130fBuhM9oQ0rCp8 WB4QV8h6LCQojpneNUYoEF1TV43hQlGUWAFx_j2gSozheE.zs3iE_iVaTa1p7A7eWlB2BggPplfb sfmR9J5x6YDkBkYv91bVCwwfYQ8py7hTpRIdi5f_z1NsKbnnNKBqxPDNnMQdfwJkEjKV.cRStkLe HZNPmZeeSZDApjmcOzlDxa73V2e2EMG0eusGFZanRHGJF9dKRLbKoEGejykRtrRQQyY_ae0iNJRK z0EyW0g0ahPE2BrH08bGyW1bqCxzmhfPprZip0IQ9a_00c35_P9eXeR6j7B427S9nSmyKmhoFEOW nYwE9EDksF54RsGx4it_TjAet5QT4JCV2MHbGijBkJujQCdSTCJ62bipn6lVXlg7_G_kaYVQrEON Z8MwZZ7Q_XlKhDlWuGvApcvgTF6ZyhVjPZaJ8hEwCP8B8_LmkAyzJ3GoucUYYhfbNQ5Vv2L5Wlxp dV_9i2Cpzm382WJLw5ERiCwNSBkSLZW.XJEq570IBxRaWbpj1iFQK24RmAGc1IO9P.X8lXrUhCOe HfEQrp2jkkrTjk0B09sVL2sLoDgOEiIj1OwshymQhGSRcMM4wjDEeIPkQ8GTJxMgwt.vUw7dw18E gW3FN0xhqrbHapzyqTX6Bh02NFdJz0OT54gGoIENAJcmujjwyueZB9jVtvhKrz.BORtrFJ3Q4RJt z9ba9DlsWHZ_zvfPtbMPoCT.viWWIK99ck1ieC4g_jbNnsqshcPJdhyN5tdY0w6iRF8hjOYyElMP OehJ9kpTmW179HkPyOvZOLlpCHSuM0ArMiHNVDUAAS8UnkFJc2hzYQhSwtTyPnFwb1ZQYRreRDWA _LU59q5Z5RElJffb_gyhGn8psOzHNShbvvnhZ9Pf50XW_8fzoKL51EshCOKMZ4AFfXo_sZNisXow uqnvzwyEVe14C4aHVLv8X4xkwEXxq7fwMf86iU5OdFOIxv1yNqAPN0lGPefm2kpHM7QoqpXG2RK4 ebiThchPpVgXeHB_Eyc3_qfn2uRL8jy97lJl4X4SlXF5_Rum53dJampxGpKhQGxSftbnaX7uEGgv bK.hIWwztccg3kLaRuJ5e6CpQQejlta_mK1bZRomY9TBqdxJC161NKR_FukP1OaINemZc4LIRBwM Tn2lMKKjPFd1yUFq4KbjDSFEa8fBn6KthJc_javf86.nHGUyyvStvJ7V8wn0yGR4ilzxhuIexqTf lFdRpeZAIxmr_8e3OTQUL6CbcvP2tt3o.5GB0RT4fr2eqjq9Eqsku7WQyDGzJYCXU2liZV83m3oy 1h2fJjyqSf28mPjbvY8YZ9IAp9tk5AGPYLbkeHiKGWPHC2xnCc9sudhea_TGJPwXgB4IIdenx_m8 Q0icb_AgXVCGjFs055HrRNzc4CBDO01I93Ahm7EkzjiFySiB.m0gbsFUj1QdYg2C3XRWqMvmEqiu sUi9s46pJmE1MQwIOlcACEMo- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.gq1.yahoo.com with HTTP; Thu, 25 Nov 2021 00:13:18 +0000 Received: by kubenode537.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 83f1962b8367c0681299ba25802a93b8; Thu, 25 Nov 2021 00:13:12 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features Date: Wed, 24 Nov 2021 16:13:11 -0800 References: <0CEA37B8-CE7F-4BAE-92B7-E71C5FD1BC22@yahoo.com> <059B833D-6C04-4171-A3C6-737CD5EFC01A@yahoo.com> To: allanjude@freebsd.org, "freebsd-arm@freebsd.org" In-Reply-To: <059B833D-6C04-4171-A3C6-737CD5EFC01A@yahoo.com> Message-Id: X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4HzyyR1H7tz4cFv X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=EN7f+660; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.84 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-3.50 / 15.00]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.84:from]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.84:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim] Reply-To: marklmi@yahoo.com From: Mark Millard via arm X-Original-From: Mark Millard X-ThisMailContainsUnwantedMimeParts: N On 2021-Nov-24, at 15:25, Mark Millard wrote: > On 2021-Nov-24, at 13:23, Mark Millard wrote: >=20 >> On 2021-Nov-24, at 13:19, Mark Millard wrote: >>=20 >>> On 2021-Nov-24, at 01:51, Mark Millard wrote: >>>=20 >>>> [Actually, the main [so: 14] equivalent.] >>>>=20 >>>> All Cortex-A72 based . . . >>>>=20 >>>> First, older system versions (before that update) >>>> then after the update: >>>>=20 >>>>=20 >>>> RPi4B 8 GiByte (older FreeBSD first, otherwise new), >>>> Cortex-A72's: >>>>=20 >>>> # openssl speed -evp aes-256-gcm >>>> . . . >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 51925.92k 58449.46k 60430.32k 61050.13k = 61180.98k 61482.75k >>>>=20 >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 28880.07k 30837.33k 31630.29k 31855.62k = 31921.54k 32034.53k >>>>=20 >>>> So: slowed down, unlike the other examples below. >>>>=20 >>>> # env OPENSSL_armcap=3D0 openssl speed -evp aes-256-gcm >>>> . . . >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 51894.33k 58540.45k 60815.22k 61534.47k = 61906.84k 62042.10k >>>>=20 >>>> So: back to the prior speed. >>>>=20 >>>> But all these are based on config.txt containing: >>>>=20 >>>> over_voltage=3D6=20 >>>> arm_freq=3D2000=20 >>>> sdram_freq_min=3D3200=20 >>>> force_turbo=3D1 >>>>=20 >>>> (The RPi4B has a heat-sink and a fan.) >>>>=20 >>>> Note: See later about the RPi4B CPU features. >>>>=20 >>>>=20 >>>> MACCHIATObin Double Shot (older first), Cortex-A72's: >>>>=20 >>>> # openssl speed -evp aes-256-gcm >>>> . . . >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 50808.49k 58466.08k 60769.11k 61444.92k = 61767.94k 61707.61k >>>>=20 >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 163579.14k 456319.27k 786544.01k 940234.41k = 1003230.55k 1005671.31k >>>>=20 >>>>=20 >>>> HoneyComb (older first), Cortex-A782's: >>>>=20 >>>> # openssl speed -evp aes-256-gcm >>>> . . . >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 57659.60k 64599.05k 67719.81k 68373.74k = 68724.24k 68793.80k >>>>=20 >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 177925.57k 502311.65k 866287.95k 1036500.35k = 1106598.06k 1106721.91k >>>>=20 >>>> Rock64 (older first), Cortex-A53's: >>>>=20 >>>> # openssl speed -evp aes-256-gcm >>>> . . . >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 18378.23k 23401.45k 24834.99k 25206.10k = 25337.86k 25258.19k >>>>=20 >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 52711.29k 163586.49k 318738.69k 420277.93k = 461373.44k 463192.06k >>>>=20 >>>>=20 >>>> OPi+2E (older first), Cortex-A7's (so armv7): >>>>=20 >>>> # openssl speed -evp aes-256-gcm >>>> . . . >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 9343.10k 11156.39k 11827.64k 11995.30k = 12025.86k 12031.32k >>>>=20 >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>>> aes-256-gcm 11013.41k 13598.44k 14034.26k 15045.97k = 15262.90k 15302.66k >>>>=20 >>>>=20 >>>>=20 >>>> For reference: >>>>=20 >>>> For the RPi4B examples (2 notes added): >>>>=20 >>>> CPU 0: ARM Cortex-A72 r0p3 affinity: 0 >>>> Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> >>>> Instruction Set Attributes 0 =3D >>>> *** NOTE the lack of ",SHA2,SHA1,AES+PMULL" above *** >>>> Instruction Set Attributes 1 =3D <> >>>> Processor Features 0 =3D >>>> Processor Features 1 =3D <> >>>> Memory Model Features 0 =3D >>>> Memory Model Features 1 =3D <8bit VMID> >>>> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> >>>> Debug Features 0 =3D >>>> Debug Features 1 =3D <> >>>> Auxiliary Features 0 =3D <> >>>> Auxiliary Features 1 =3D <> >>>> AArch32 Instruction Set Attributes 5 =3D >>>> *** NOTE the lack of ",SHA2,SHA1,AES+VMULL" above *** >>>> AArch32 Media and VFP Features 0 =3D >>>> AArch32 Media and VFP Features 1 =3D >>>>=20 >>>> For the MACCHIATObin Double Shot examples: >>>>=20 >>>> CPU 0: ARM Cortex-A72 r0p1 affinity: 0 0 >>>> Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> >>>> Instruction Set Attributes 0 =3D >>>> Instruction Set Attributes 1 =3D <> >>>> Processor Features 0 =3D >>>> Processor Features 1 =3D <> >>>> Memory Model Features 0 =3D >>>> Memory Model Features 1 =3D <8bit VMID> >>>> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> >>>> Debug Features 0 =3D >>>> Debug Features 1 =3D <> >>>> Auxiliary Features 0 =3D <> >>>> Auxiliary Features 1 =3D <> >>>> AArch32 Instruction Set Attributes 5 =3D = >>>> AArch32 Media and VFP Features 0 =3D >>>> AArch32 Media and VFP Features 1 =3D >>>>=20 >>>>=20 >>>> For the HoneyComb examples: >>>>=20 >>>> CPU 0: ARM Cortex-A72 r0p3 affinity: 0 0 >>>> Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> >>>> Instruction Set Attributes 0 =3D >>>> Instruction Set Attributes 1 =3D <> >>>> Processor Features 0 =3D >>>> Processor Features 1 =3D <> >>>> Memory Model Features 0 =3D >>>> Memory Model Features 1 =3D <8bit VMID> >>>> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> >>>> Debug Features 0 =3D >>>> Debug Features 1 =3D <> >>>> Auxiliary Features 0 =3D <> >>>> Auxiliary Features 1 =3D <> >>>> AArch32 Instruction Set Attributes 5 =3D = >>>> AArch32 Media and VFP Features 0 =3D >>>> AArch32 Media and VFP Features 1 =3D >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> For the Rock64 examples: >>>>=20 >>>> CPU 0: ARM Cortex-A53 r0p4 affinity: 0 >>>> Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG> >>>> Instruction Set Attributes 0 =3D >>>> Instruction Set Attributes 1 =3D <> >>>> Processor Features 0 =3D >>>> Processor Features 1 =3D <> >>>> Memory Model Features 0 =3D >>>> Memory Model Features 1 =3D <8bit VMID> >>>> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> >>>> Debug Features 0 =3D >>>> Debug Features 1 =3D <> >>>> Auxiliary Features 0 =3D <> >>>> Auxiliary Features 1 =3D <> >>>> AArch32 Instruction Set Attributes 5 =3D = >>>> AArch32 Media and VFP Features 0 =3D >>>> AArch32 Media and VFP Features 1 =3D >>>> C >>>>=20 >>>>=20 >>>> For the OPi+2E examples: >>>>=20 >>>> CPU: ARM Cortex-A7 r0p5 (ECO: 0x00000000) >>>> CPU Features:=20 >>>> Multiprocessing, Thumb2, Security, Virtualization, Generic Timer, = VMSAv7, >>>> PXN, LPAE, Coherent Walk >>>> Optional instructions:=20 >>>> SDIV/UDIV, UMULL, SMULL, SIMD(ext) >>>> LoUU:2 LoC:3 LoUIS:2=20 >>>> Cache level 1: >>>> 32KB/64B 4-way data cache WB Read-Alloc Write-Alloc >>>> 32KB/32B 2-way instruction cache Read-Alloc >>>> Cache level 2: >>>> 512KB/64B 8-way unified cache WB Read-Alloc Write-Alloc >>>=20 >>> Note: as the issue applies to stable/13 and main [so: 14] >>> (for example), I continue to use the freebsd-arm list >>> instead of a list that reports commits to stable/* but >>> not to main. >>>=20 >>> Relative to: >>>=20 >>> #define HWCAP_FP 0x00000001 >>> #define HWCAP_ASIMD 0x00000002 >>> #define HWCAP_EVTSTRM 0x00000004 >>> #define HWCAP_AES 0x00000008 >>> #define HWCAP_PMULL 0x00000010 >>> #define HWCAP_SHA1 0x00000020 >>> #define HWCAP_SHA2 0x00000040 >>> #define HWCAP_CRC32 0x00000080 >>>=20 >>> The single-bit enabled OPENSSL_armcap that gets the slow >>> result is: >>>=20 >>> # env OPENSSL_armcap=3D1 openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 28427.04k 30712.32k 31446.00k 31683.40k = 31829.10k 31839.55k >>>=20 >>> The illegal instruction ones for aes-256-gcm were: >>>=20 >>> # env OPENSSL_armcap=3D4 openssl speed -evp aes-256-gcm >>> Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction = (core dumped) >>>=20 >>> env OPENSSL_armcap=3D32 openssl speed -evp aes-256-gcm >>> Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction = (core dumped) >>>=20 >>> (sha256 does not match for what is illegal.) >>>=20 >>> Ignoring the illegal-instruction producing bits, HWCAP_FP mixed >>> with any one of the other bits was also similarly slow. >>>=20 >>> As for all the non-illegal-instruction producing bits: also = similarly >>> slow: >>>=20 >>> # env OPENSSL_armcap=3D219 openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 28922.63k 30711.51k 31522.15k 31722.15k = 31788.97k 31845.03k >>>=20 >>> Disabling just HWCAP_FP from that got the fast category of >>> result: >>>=20 >>> # env OPENSSL_armcap=3D218 openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 49543.14k 58068.22k 60236.56k 60724.37k = 61216.09k 61212.99k >>>=20 >>>=20 >>> As for sha256 . . . >>>=20 >>> # env OPENSSL_armcap=3D0 openssl speed -evp sha256 >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> sha256 22434.19k 59895.91k 117258.16k 156264.31k = 172624.81k 173848.52k >>>=20 >>> (I'll not list all the similar performing ones but >>> will list all illegal-instruction producing ones.) >>>=20 >>> # env OPENSSL_armcap=3D4 openssl speed -evp sha256 >>> Doing sha256 for 3s on 16 size blocks: 4082055 sha256's in 2.99s >>> Doing sha256 for 3s on 64 size blocks: 2752520 sha256's in 3.02s >>> Doing sha256 for 3s on 256 size blocks: 1372584 sha256's in 3.03s >>> Doing sha256 for 3s on 1024 size blocks: 470215 sha256's in 3.11s >>> Doing sha256 for 3s on 8192 size blocks: 64700 sha256's in 3.07s >>> Doing sha256 for 3s on 16384 size blocks: 31847 sha256's in 3.00s >>> Illegal instruction (core dumped) >>>=20 >>> # env OPENSSL_armcap=3D16 openssl speed -evp sha256 >>> Doing sha256 for 3s on 16 size blocks: Illegal instruction (core = dumped) >>>=20 >>> (16 worked for aes-256-gcm but 32 did not.) >>>=20 >>> So: no significantly slower examples of single enabled >>> bit cases. >>>=20 >>> No (non-illegal-instruction) 2-enabled-bits examples were >>> dissimilar for the speed. >>=20 >> Incorrect description of what I tested: I testd only >> 2-bit combinations involving HWCAP_FP being enabled. >> (Same as for aes-256-gcm .) >>=20 >>> For reference (avoiding illegal-instructions): >>>=20 >>> # env OPENSSL_armcap=3D235 openssl speed -evp sha256 >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> sha256 23185.66k 62689.73k 125814.72k 167981.88k = 187833.65k 188968.95k >>>=20 >>> So: also similar speed. >>>=20 >>> Need any other specific bit combinations? >>=20 >=20 >=20 > chroot'd into a armv7 context on the RPi4B gets different results > for aes-256-gcm: having the HWCAP_FP enabled speed things up. >=20 > # env OPENSSL_armcap=3D0 openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 35983.70k 41987.64k 44077.00k 44693.54k = 44685.68k 44717.40k >=20 > # env OPENSSL_armcap=3D1 openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes > aes-256-gcm 55339.93k 64644.18k 68001.37k 72708.53k = 74237.56k 74247.87k >=20 > # env OPENSSL_armcap=3D4 openssl speed -evp aes-256-gcm > Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) >=20 > # env OPENSSL_armcap=3D32 openssl speed -evp aes-256-gcm > Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) >=20 > In general OPENSSL_armcap=3D2**N was slower and = OPENSSL_armcap=3D(2**N)+1 > was faster in a similar manor. Similarly for 218 vs. 219. >=20 > sha256 did not show such a distinction. >=20 > The armv7 illegal-instruction generation cases for > sha256 were: >=20 > # env OPENSSL_armcap=3D4 openssl speed -evp sha256 > Doing sha256 for 3s on 16 size blocks: 3313106 sha256's in 3.02s > Doing sha256 for 3s on 64 size blocks: 2403376 sha256's in 3.02s > Doing sha256 for 3s on 256 size blocks: 1289917 sha256's in 3.02s > Doing sha256 for 3s on 1024 size blocks: 446543 sha256's in 3.00s > Doing sha256 for 3s on 8192 size blocks: 64123 sha256's in 3.03s > Doing sha256 for 3s on 16384 size blocks: 32756 sha256's in 3.08s > Illegal instruction (core dumped) >=20 > # env OPENSSL_armcap=3D16 openssl speed -evp sha256 > Doing sha256 for 3s on 16 size blocks: Illegal instruction (core = dumped) >=20 >=20 >=20 > Note: I focused on large scale differences in general. I was not = trying > to find the optimal combination. For that I'd also have to test out > repeatability/variability for each OPENSSL_armcap value that was in > the faster range. FYI: on the OPi+2E (Cortex-A7) a more generates illegal instructions than a chroot to armv7 does on the RPi4B: # env OPENSSL_armcap=3D2 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) # env OPENSSL_armcap=3D4 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) # env OPENSSL_armcap=3D32 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) env OPENSSL_armcap=3D2 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: 579668 sha256's in 3.01s Doing sha256 for 3s on 64 size blocks: 436508 sha256's in 3.00s Doing sha256 for 3s on 256 size blocks: 240826 sha256's in 3.03s Doing sha256 for 3s on 1024 size blocks: 85768 sha256's in 3.04s Doing sha256 for 3s on 8192 size blocks: 12248 sha256's in 3.04s Doing sha256 for 3s on 16384 size blocks: 6096 sha256's in 3.00s Illegal instruction (core dumped) # env OPENSSL_armcap=3D4 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: 582757 sha256's in 3.00s Doing sha256 for 3s on 64 size blocks: 443027 sha256's in 3.04s Doing sha256 for 3s on 256 size blocks: 241189 sha256's in 3.04s Doing sha256 for 3s on 1024 size blocks: 85722 sha256's in 3.04s Doing sha256 for 3s on 8192 size blocks: 12074 sha256's in 3.00s Doing sha256 for 3s on 16384 size blocks: 6097 sha256's in 3.00s Illegal instruction (core dumped) # env OPENSSL_armcap=3D16 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: Illegal instruction (core dumped) I should note that my buildworld's and buildkernel's are set up to involve -mcpu=3Dcortex-a72 or -mcpu=3Dcortex-a53 or = -mcpu-cortex-a7 as appropriate to matching the target hardware. The armv7 chroot's builds used -mcpu-cortex-a7 as well. (The installs are of the same system build as for the OPi+2E .) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)