From nobody Wed Nov 24 23:25:46 2021 X-Original-To: arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D1DD3189DAF2 for ; Wed, 24 Nov 2021 23:26:01 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic306-21.consmr.mail.gq1.yahoo.com (sonic306-21.consmr.mail.gq1.yahoo.com [98.137.68.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Hzxvr4rKmz3lKH for ; Wed, 24 Nov 2021 23:26:00 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1637796353; bh=ravsKM7T8R4YAShdXw/8UXlx8cmEv7ht759V1zT5uTU=; h=From:Subject:Date:References:To:In-Reply-To:From:Subject:Reply-To; b=jMQUpYGSBXN5TblDysqhR0uFmZdhVyOeJB9doIzbvN6684U6R7X/byy5+53nRW70DTkdq5VQKdUOexD1usf6ZF7jGp7ALg2QBKjTGhaAxK8TW+Y2gDYBABTgC2722tHEbA4lQZWvRq9vC8RSqb0VLeMzhYoazOj9ns7fRM7iPxx8AupI7QzlkOq7o4/WjxKPBm+gBfw9MalupOnQrS+VbHA3GJmmvftZIbkZnfyZoPdw14VS/5RAOgfFTMqWymtulMa1LsJ/lHiQJUqc35V1aTNn+quvrpJhbxGStz/EoJH5FAzAr5gJkWs9jutzPi2aYfPZRhrzFCc/aYJjLxN8Mw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1637796353; bh=oIA5y36gmZMpjQC1OBAcBD7ImWO8LuMlFpxME+kGLh2=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=NF0d/6nzuBDrjkf/OlmaIdV+INtVZG7zVpVJcDPrhzd9/Fkd+6pRNfZBi0s85jjNyX6sMsTlLt1vk6bR76mAamayHg//YWwyA2du6sgRpgjcRKP8XqTuF2tiXOywfGHtc/yVAZV60HoBZT6SrdPyala+oFvQ8EkqyzCUFBIn15mCRhZtGdAkYKDo1hitAX7/Zqz6+MjLlkp0ApLgTkwy21J49lWpgh3gP1eAmhhAH3RNuTeYaKFYK1W4sFHWjDXEYQsZmR24ev+exem0wKzwr/JomzTlMcpBWbhZ1twlJ3he0SZEe8PN2EDSmMx2Qlsohv0mz0sLVJpXvy9gvmQs6g== X-YMail-OSG: 6usYMfoVM1k8BCOPEJSG4vT3rtBw2uPXPtnBi8KucSCDi52IbnPT965r4NlH2GH np0JJRHy5zP0HNWUEa1GU_wVdzBUzNJFgrcY3E7rYqNA2TYjTatSLut47.iQPt0SH.Vb96CxsnvV hsn8wbWCVX2GW7N9Az46i8KmjELFKwW9rHecBkAi_wSsYX.LwfSqSFs8R3TMZMTJwwhOm4r4cQlR _q5ULQVbyKsQ4AUHDlEh87TX7JP3p4F.XMMWa.7G2YHjEJiuzB8e8pvlqj9vTSq.WH00kCpuEbdW zfWqwCB9c7iiQvgrPjyZJ_jBs8YaR6II8n2w.sDlQ5MNejyGREpsNfOaxKrSx6YCymngwQ0jhVaT VRzrkBiacnPkuICoRbDzKUKPkECV.veDzOPn_kGnfkTPLvTZ27p4xO2rTXfYSqSdcm7UbG_0JDXz lwWD04f6PiBsuVFW3.x3UNnt0huFC1Qmz8d28jgOU7P0GNtIV72JZfz_l.5lYWNhq9dm8EpR3pIj sYNd0AF6uyKSASAuxWaCMCcPxAwQD6.VuCJi9zJfZ7jr8T70czo3oS6fk7DUZa5Ha2CAJoxcDSuY zPrlDq4w80cfOkSizcEsFKvXopFloUzPVpxDzt0WNj5lRUuIOFrCXUxDSGIC5mxLtu9illgvxmpO ZKVlJvwmSATv9qiQ9H9ciKKfI54ptwBP9cDjyFSMesHFVWG60GGWzuw5P_XE_h43jQt9jJxbovIP NvL4Nl7s2atpMjqdywLkEzdUAH_jTibrHy2t4gxh0UIbIqC1v0IY.H2Tt9b7QBFKKCZI3adI.y7a CZL6y1oiNUMMTqS6RpTnP2Rx1WwhGk2TSnVyPXBoRGwyBR0JfVyxMM7zn1p4Brk4yI0LYHLjnhQa jJBC.TAdPD9rz7lSjhpV5LMN6jdBsTq0QLjcBQaS1ya25w2FmBF.S4MQmoX6c0j9Gj43upOcf._n PAGheo8yS.5ABNOp8hH1A6F8WEvNK9KVcdLlfdQb_ySpil3E9EcBHRnl9KJrQxZUwCsqeCzu4r6U GA189hMddxDcKGK1WkfgXjhfXCHn1Z27P993YeaFt6DYEfHmfilN66zNajLOmSj84580qQ.DFbs. llTHGXC.2f2jkxwdPnoVf9PUm.TSEBlg1cb0U8Lhj454W5c4Lwf_WCnahmm3meKqwxiu8MKSEDYx 9cyVHUpTc18GwAbHI0tDKpTflKjmORjavossUSF9aQMlh.1uTHv73641t230t.W2gsdjHkJjf8aT xbjkJL1jjcNBd9VJFZ0IlxsGnO.REX_kbji3NOBx_UP5ymY0LBRu2iazE8b_HZ5Ws7hU7vxAPdY2 nZjv5X3f6964CTEixG84Jz22tzaNsjSg03EodKY2WKRYO6dhRFYeqHHkrLEQ062hzmK5JdZdjmnY sy3X2s0R8GWHXQAGX.IOln60gmJCfRVCRB49n70K8iB2wUjNgHSCQO1GDfRIPdfcnxN1BCWg0zsG JcDE5hQbUxNwkkXiS6p0p9AttQjsv3dSsH6W2gc_5WfgnuVZ6FIPsfAer28DXVFn4EmCLGtg38LR SmCHIcwoGSKqpkPWrw.uGALaR901HU6tfe2T43k3LHaac8gspgKzYNephyHm_Q8XvzSDkrFYG3rj OHip1z4znIt2bbDLeK8xbNUxw78LN2owZQ.eYuFPYNILRPI9TrVzjE3qTE0WBeB2s.vid7aX8CG7 FWRfo9MJ21_57v5qwlOYXd5AwGSpMCVH3fLQFsPigRQ2vAcna488qBaz75j463MUwoBbXQRbePU0 9gdq0MblxVfVH90w7qkJNBNINn64d2XQNO4ewYy8htOFeuxVaxX.Q0A.ZTQ9eAjMERzNR3q8piVb 1LYEwpgO_GsSE9emPWgsqmkstZrGH8VfH3ZgkJUCqXAwGacg85j3N.2Cx7cIyFFf54W3MYPoCgTv ZNljr3wyYuMzxABpAFIbX.oH4mpHISwhnMyow6gKVjIJnlm8sUWTrH_gB9MGpd7VAGyZmjDw1etL H6LFPULLJO1lOs_Fru.zMT5.O3h59JbeGuq1INiPoJA.FI3mRvp7RxJgt48.gLsxXSEVKLrlP8e7 7xmhi5Qb6RC2WdJLJduNGsoaKhmlHojSh8K_TKdNq.HaOSA74ygz_SHW3254Lq_ibU8UgNG3ZEJ4 0FvxrgDWI5PI.e8804pcHtCq23JV5.JL6ItrD1e3.3WvwMxhjEF4xxmx5D7XDdhWRTBkylN7nV2W nMX3SSYge.my1qdM4cQRoBF1fnXRMFYWV3_dpN.ofOzEFww-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic306.consmr.mail.gq1.yahoo.com with HTTP; Wed, 24 Nov 2021 23:25:53 +0000 Received: by kubenode503.mail-prod1.omega.bf1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 24b767a362038dd9f21e6c3ab4b6c668; Wed, 24 Nov 2021 23:25:48 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features Date: Wed, 24 Nov 2021 15:25:46 -0800 References: <0CEA37B8-CE7F-4BAE-92B7-E71C5FD1BC22@yahoo.com> To: allanjude@freebsd.org, "freebsd-arm@freebsd.org" In-Reply-To: Message-Id: <059B833D-6C04-4171-A3C6-737CD5EFC01A@yahoo.com> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4Hzxvr4rKmz3lKH X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=jMQUpYGS; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.68.84 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-3.50 / 15.00]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[98.137.68.84:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.68.84:from] Reply-To: marklmi@yahoo.com From: Mark Millard via arm X-Original-From: Mark Millard X-ThisMailContainsUnwantedMimeParts: N On 2021-Nov-24, at 13:23, Mark Millard wrote: > On 2021-Nov-24, at 13:19, Mark Millard wrote: >=20 >> On 2021-Nov-24, at 01:51, Mark Millard wrote: >>=20 >>> [Actually, the main [so: 14] equivalent.] >>>=20 >>> All Cortex-A72 based . . . >>>=20 >>> First, older system versions (before that update) >>> then after the update: >>>=20 >>>=20 >>> RPi4B 8 GiByte (older FreeBSD first, otherwise new), >>> Cortex-A72's: >>>=20 >>> # openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 51925.92k 58449.46k 60430.32k 61050.13k = 61180.98k 61482.75k >>>=20 >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 28880.07k 30837.33k 31630.29k 31855.62k = 31921.54k 32034.53k >>>=20 >>> So: slowed down, unlike the other examples below. >>>=20 >>> # env OPENSSL_armcap=3D0 openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 51894.33k 58540.45k 60815.22k 61534.47k = 61906.84k 62042.10k >>>=20 >>> So: back to the prior speed. >>>=20 >>> But all these are based on config.txt containing: >>>=20 >>> over_voltage=3D6=20 >>> arm_freq=3D2000=20 >>> sdram_freq_min=3D3200=20 >>> force_turbo=3D1 >>>=20 >>> (The RPi4B has a heat-sink and a fan.) >>>=20 >>> Note: See later about the RPi4B CPU features. >>>=20 >>>=20 >>> MACCHIATObin Double Shot (older first), Cortex-A72's: >>>=20 >>> # openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 50808.49k 58466.08k 60769.11k 61444.92k = 61767.94k 61707.61k >>>=20 >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 163579.14k 456319.27k 786544.01k 940234.41k = 1003230.55k 1005671.31k >>>=20 >>>=20 >>> HoneyComb (older first), Cortex-A782's: >>>=20 >>> # openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 57659.60k 64599.05k 67719.81k 68373.74k = 68724.24k 68793.80k >>>=20 >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 177925.57k 502311.65k 866287.95k 1036500.35k = 1106598.06k 1106721.91k >>>=20 >>> Rock64 (older first), Cortex-A53's: >>>=20 >>> # openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 18378.23k 23401.45k 24834.99k 25206.10k = 25337.86k 25258.19k >>>=20 >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 52711.29k 163586.49k 318738.69k 420277.93k = 461373.44k 463192.06k >>>=20 >>>=20 >>> OPi+2E (older first), Cortex-A7's (so armv7): >>>=20 >>> # openssl speed -evp aes-256-gcm >>> . . . >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 9343.10k 11156.39k 11827.64k 11995.30k = 12025.86k 12031.32k >>>=20 >>> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >>> aes-256-gcm 11013.41k 13598.44k 14034.26k 15045.97k = 15262.90k 15302.66k >>>=20 >>>=20 >>>=20 >>> For reference: >>>=20 >>> For the RPi4B examples (2 notes added): >>>=20 >>> CPU 0: ARM Cortex-A72 r0p3 affinity: 0 >>> Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> >>> Instruction Set Attributes 0 =3D >>> *** NOTE the lack of ",SHA2,SHA1,AES+PMULL" above *** >>> Instruction Set Attributes 1 =3D <> >>> Processor Features 0 =3D >>> Processor Features 1 =3D <> >>> Memory Model Features 0 =3D >>> Memory Model Features 1 =3D <8bit VMID> >>> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> >>> Debug Features 0 =3D >>> Debug Features 1 =3D <> >>> Auxiliary Features 0 =3D <> >>> Auxiliary Features 1 =3D <> >>> AArch32 Instruction Set Attributes 5 =3D >>> *** NOTE the lack of ",SHA2,SHA1,AES+VMULL" above *** >>> AArch32 Media and VFP Features 0 =3D >>> AArch32 Media and VFP Features 1 =3D >>>=20 >>> For the MACCHIATObin Double Shot examples: >>>=20 >>> CPU 0: ARM Cortex-A72 r0p1 affinity: 0 0 >>> Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> >>> Instruction Set Attributes 0 =3D >>> Instruction Set Attributes 1 =3D <> >>> Processor Features 0 =3D >>> Processor Features 1 =3D <> >>> Memory Model Features 0 =3D >>> Memory Model Features 1 =3D <8bit VMID> >>> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> >>> Debug Features 0 =3D >>> Debug Features 1 =3D <> >>> Auxiliary Features 0 =3D <> >>> Auxiliary Features 1 =3D <> >>> AArch32 Instruction Set Attributes 5 =3D = >>> AArch32 Media and VFP Features 0 =3D >>> AArch32 Media and VFP Features 1 =3D >>>=20 >>>=20 >>> For the HoneyComb examples: >>>=20 >>> CPU 0: ARM Cortex-A72 r0p3 affinity: 0 0 >>> Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> >>> Instruction Set Attributes 0 =3D >>> Instruction Set Attributes 1 =3D <> >>> Processor Features 0 =3D >>> Processor Features 1 =3D <> >>> Memory Model Features 0 =3D >>> Memory Model Features 1 =3D <8bit VMID> >>> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> >>> Debug Features 0 =3D >>> Debug Features 1 =3D <> >>> Auxiliary Features 0 =3D <> >>> Auxiliary Features 1 =3D <> >>> AArch32 Instruction Set Attributes 5 =3D = >>> AArch32 Media and VFP Features 0 =3D >>> AArch32 Media and VFP Features 1 =3D >>>=20 >>>=20 >>>=20 >>>=20 >>> For the Rock64 examples: >>>=20 >>> CPU 0: ARM Cortex-A53 r0p4 affinity: 0 >>> Cache Type =3D <64 byte D-cacheline,64 byte = I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG> >>> Instruction Set Attributes 0 =3D >>> Instruction Set Attributes 1 =3D <> >>> Processor Features 0 =3D >>> Processor Features 1 =3D <> >>> Memory Model Features 0 =3D >>> Memory Model Features 1 =3D <8bit VMID> >>> Memory Model Features 2 =3D <32bit CCIDX,48bit VA> >>> Debug Features 0 =3D >>> Debug Features 1 =3D <> >>> Auxiliary Features 0 =3D <> >>> Auxiliary Features 1 =3D <> >>> AArch32 Instruction Set Attributes 5 =3D = >>> AArch32 Media and VFP Features 0 =3D >>> AArch32 Media and VFP Features 1 =3D >>> C >>>=20 >>>=20 >>> For the OPi+2E examples: >>>=20 >>> CPU: ARM Cortex-A7 r0p5 (ECO: 0x00000000) >>> CPU Features:=20 >>> Multiprocessing, Thumb2, Security, Virtualization, Generic Timer, = VMSAv7, >>> PXN, LPAE, Coherent Walk >>> Optional instructions:=20 >>> SDIV/UDIV, UMULL, SMULL, SIMD(ext) >>> LoUU:2 LoC:3 LoUIS:2=20 >>> Cache level 1: >>> 32KB/64B 4-way data cache WB Read-Alloc Write-Alloc >>> 32KB/32B 2-way instruction cache Read-Alloc >>> Cache level 2: >>> 512KB/64B 8-way unified cache WB Read-Alloc Write-Alloc >>=20 >> Note: as the issue applies to stable/13 and main [so: 14] >> (for example), I continue to use the freebsd-arm list >> instead of a list that reports commits to stable/* but >> not to main. >>=20 >> Relative to: >>=20 >> #define HWCAP_FP 0x00000001 >> #define HWCAP_ASIMD 0x00000002 >> #define HWCAP_EVTSTRM 0x00000004 >> #define HWCAP_AES 0x00000008 >> #define HWCAP_PMULL 0x00000010 >> #define HWCAP_SHA1 0x00000020 >> #define HWCAP_SHA2 0x00000040 >> #define HWCAP_CRC32 0x00000080 >>=20 >> The single-bit enabled OPENSSL_armcap that gets the slow >> result is: >>=20 >> # env OPENSSL_armcap=3D1 openssl speed -evp aes-256-gcm >> . . . >> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >> aes-256-gcm 28427.04k 30712.32k 31446.00k 31683.40k = 31829.10k 31839.55k >>=20 >> The illegal instruction ones for aes-256-gcm were: >>=20 >> # env OPENSSL_armcap=3D4 openssl speed -evp aes-256-gcm >> Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) >>=20 >> env OPENSSL_armcap=3D32 openssl speed -evp aes-256-gcm >> Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) >>=20 >> (sha256 does not match for what is illegal.) >>=20 >> Ignoring the illegal-instruction producing bits, HWCAP_FP mixed >> with any one of the other bits was also similarly slow. >>=20 >> As for all the non-illegal-instruction producing bits: also similarly >> slow: >>=20 >> # env OPENSSL_armcap=3D219 openssl speed -evp aes-256-gcm >> . . . >> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >> aes-256-gcm 28922.63k 30711.51k 31522.15k 31722.15k = 31788.97k 31845.03k >>=20 >> Disabling just HWCAP_FP from that got the fast category of >> result: >>=20 >> # env OPENSSL_armcap=3D218 openssl speed -evp aes-256-gcm >> . . . >> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >> aes-256-gcm 49543.14k 58068.22k 60236.56k 60724.37k = 61216.09k 61212.99k >>=20 >>=20 >> As for sha256 . . . >>=20 >> # env OPENSSL_armcap=3D0 openssl speed -evp sha256 >> . . . >> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >> sha256 22434.19k 59895.91k 117258.16k 156264.31k = 172624.81k 173848.52k >>=20 >> (I'll not list all the similar performing ones but >> will list all illegal-instruction producing ones.) >>=20 >> # env OPENSSL_armcap=3D4 openssl speed -evp sha256 >> Doing sha256 for 3s on 16 size blocks: 4082055 sha256's in 2.99s >> Doing sha256 for 3s on 64 size blocks: 2752520 sha256's in 3.02s >> Doing sha256 for 3s on 256 size blocks: 1372584 sha256's in 3.03s >> Doing sha256 for 3s on 1024 size blocks: 470215 sha256's in 3.11s >> Doing sha256 for 3s on 8192 size blocks: 64700 sha256's in 3.07s >> Doing sha256 for 3s on 16384 size blocks: 31847 sha256's in 3.00s >> Illegal instruction (core dumped) >>=20 >> # env OPENSSL_armcap=3D16 openssl speed -evp sha256 >> Doing sha256 for 3s on 16 size blocks: Illegal instruction (core = dumped) >>=20 >> (16 worked for aes-256-gcm but 32 did not.) >>=20 >> So: no significantly slower examples of single enabled >> bit cases. >>=20 >> No (non-illegal-instruction) 2-enabled-bits examples were >> dissimilar for the speed. >=20 > Incorrect description of what I tested: I testd only > 2-bit combinations involving HWCAP_FP being enabled. > (Same as for aes-256-gcm .) >=20 >> For reference (avoiding illegal-instructions): >>=20 >> # env OPENSSL_armcap=3D235 openssl speed -evp sha256 >> . . . >> type 16 bytes 64 bytes 256 bytes 1024 bytes = 8192 bytes 16384 bytes >> sha256 23185.66k 62689.73k 125814.72k 167981.88k = 187833.65k 188968.95k >>=20 >> So: also similar speed. >>=20 >> Need any other specific bit combinations? >=20 chroot'd into a armv7 context on the RPi4B gets different results for aes-256-gcm: having the HWCAP_FP enabled speed things up. # env OPENSSL_armcap=3D0 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 35983.70k 41987.64k 44077.00k 44693.54k = 44685.68k 44717.40k # env OPENSSL_armcap=3D1 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 = bytes 16384 bytes aes-256-gcm 55339.93k 64644.18k 68001.37k 72708.53k = 74237.56k 74247.87k # env OPENSSL_armcap=3D4 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) # env OPENSSL_armcap=3D32 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core = dumped) In general OPENSSL_armcap=3D2**N was slower and OPENSSL_armcap=3D(2**N)+1 was faster in a similar manor. Similarly for 218 vs. 219. sha256 did not show such a distinction. The armv7 illegal-instruction generation cases for sha256 were: # env OPENSSL_armcap=3D4 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: 3313106 sha256's in 3.02s Doing sha256 for 3s on 64 size blocks: 2403376 sha256's in 3.02s Doing sha256 for 3s on 256 size blocks: 1289917 sha256's in 3.02s Doing sha256 for 3s on 1024 size blocks: 446543 sha256's in 3.00s Doing sha256 for 3s on 8192 size blocks: 64123 sha256's in 3.03s Doing sha256 for 3s on 16384 size blocks: 32756 sha256's in 3.08s Illegal instruction (core dumped) # env OPENSSL_armcap=3D16 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: Illegal instruction (core dumped) Note: I focused on large scale differences in general. I was not trying to find the optimal combination. For that I'd also have to test out repeatability/variability for each OPENSSL_armcap value that was in the faster range. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)