Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features
- Reply: Mark Millard via arm : "Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features"
- In reply to: Mark Millard via arm : "Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 24 Nov 2021 21:19:16 UTC
On 2021-Nov-24, at 01:51, Mark Millard <marklmi@yahoo.com> wrote: > [Actually, the main [so: 14] equivalent.] > > All Cortex-A72 based . . . > > First, older system versions (before that update) > then after the update: > > > RPi4B 8 GiByte (older FreeBSD first, otherwise new), > Cortex-A72's: > > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 51925.92k 58449.46k 60430.32k 61050.13k 61180.98k 61482.75k > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 28880.07k 30837.33k 31630.29k 31855.62k 31921.54k 32034.53k > > So: slowed down, unlike the other examples below. > > # env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 51894.33k 58540.45k 60815.22k 61534.47k 61906.84k 62042.10k > > So: back to the prior speed. > > But all these are based on config.txt containing: > > over_voltage=6 > arm_freq=2000 > sdram_freq_min=3200 > force_turbo=1 > > (The RPi4B has a heat-sink and a fan.) > > Note: See later about the RPi4B CPU features. > > > MACCHIATObin Double Shot (older first), Cortex-A72's: > > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 50808.49k 58466.08k 60769.11k 61444.92k 61767.94k 61707.61k > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 163579.14k 456319.27k 786544.01k 940234.41k 1003230.55k 1005671.31k > > > HoneyComb (older first), Cortex-A782's: > > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 57659.60k 64599.05k 67719.81k 68373.74k 68724.24k 68793.80k > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 177925.57k 502311.65k 866287.95k 1036500.35k 1106598.06k 1106721.91k > > Rock64 (older first), Cortex-A53's: > > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 18378.23k 23401.45k 24834.99k 25206.10k 25337.86k 25258.19k > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 52711.29k 163586.49k 318738.69k 420277.93k 461373.44k 463192.06k > > > OPi+2E (older first), Cortex-A7's (so armv7): > > # openssl speed -evp aes-256-gcm > . . . > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 9343.10k 11156.39k 11827.64k 11995.30k 12025.86k 12031.32k > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes > aes-256-gcm 11013.41k 13598.44k 14034.26k 15045.97k 15262.90k 15302.66k > > > > For reference: > > For the RPi4B examples (2 notes added): > > CPU 0: ARM Cortex-A72 r0p3 affinity: 0 > Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> > Instruction Set Attributes 0 = <CRC32> > *** NOTE the lack of ",SHA2,SHA1,AES+PMULL" above *** > Instruction Set Attributes 1 = <> > Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> > Processor Features 1 = <> > Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA> > Memory Model Features 1 = <8bit VMID> > Memory Model Features 2 = <32bit CCIDX,48bit VA> > Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> > Debug Features 1 = <> > Auxiliary Features 0 = <> > Auxiliary Features 1 = <> > AArch32 Instruction Set Attributes 5 = <CRC32,SEVL> > *** NOTE the lack of ",SHA2,SHA1,AES+VMULL" above *** > AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> > AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> > > For the MACCHIATObin Double Shot examples: > > CPU 0: ARM Cortex-A72 r0p1 affinity: 0 0 > Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> > Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL> > Instruction Set Attributes 1 = <> > Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> > Processor Features 1 = <> > Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA> > Memory Model Features 1 = <8bit VMID> > Memory Model Features 2 = <32bit CCIDX,48bit VA> > Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> > Debug Features 1 = <> > Auxiliary Features 0 = <> > Auxiliary Features 1 = <> > AArch32 Instruction Set Attributes 5 = <CRC32,SHA2,SHA1,AES+VMULL,SEVL> > AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> > AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> > > > For the HoneyComb examples: > > CPU 0: ARM Cortex-A72 r0p3 affinity: 0 0 > Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> > Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL> > Instruction Set Attributes 1 = <> > Processor Features 0 = <GIC,AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> > Processor Features 1 = <> > Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA> > Memory Model Features 1 = <8bit VMID> > Memory Model Features 2 = <32bit CCIDX,48bit VA> > Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> > Debug Features 1 = <> > Auxiliary Features 0 = <> > Auxiliary Features 1 = <> > AArch32 Instruction Set Attributes 5 = <CRC32,SHA2,SHA1,AES+VMULL,SEVL> > AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> > AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> > > > > > For the Rock64 examples: > > CPU 0: ARM Cortex-A53 r0p4 affinity: 0 > Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG> > Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL> > Instruction Set Attributes 1 = <> > Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> > Processor Features 1 = <> > Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,1TB PA> > Memory Model Features 1 = <8bit VMID> > Memory Model Features 2 = <32bit CCIDX,48bit VA> > Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> > Debug Features 1 = <> > Auxiliary Features 0 = <> > Auxiliary Features 1 = <> > AArch32 Instruction Set Attributes 5 = <CRC32,SHA2,SHA1,AES+VMULL,SEVL> > AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> > AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> > C > > > For the OPi+2E examples: > > CPU: ARM Cortex-A7 r0p5 (ECO: 0x00000000) > CPU Features: > Multiprocessing, Thumb2, Security, Virtualization, Generic Timer, VMSAv7, > PXN, LPAE, Coherent Walk > Optional instructions: > SDIV/UDIV, UMULL, SMULL, SIMD(ext) > LoUU:2 LoC:3 LoUIS:2 > Cache level 1: > 32KB/64B 4-way data cache WB Read-Alloc Write-Alloc > 32KB/32B 2-way instruction cache Read-Alloc > Cache level 2: > 512KB/64B 8-way unified cache WB Read-Alloc Write-Alloc Note: as the issue applies to stable/13 and main [so: 14] (for example), I continue to use the freebsd-arm list instead of a list that reports commits to stable/* but not to main. Relative to: #define HWCAP_FP 0x00000001 #define HWCAP_ASIMD 0x00000002 #define HWCAP_EVTSTRM 0x00000004 #define HWCAP_AES 0x00000008 #define HWCAP_PMULL 0x00000010 #define HWCAP_SHA1 0x00000020 #define HWCAP_SHA2 0x00000040 #define HWCAP_CRC32 0x00000080 The single-bit enabled OPENSSL_armcap that gets the slow result is: # env OPENSSL_armcap=1 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-gcm 28427.04k 30712.32k 31446.00k 31683.40k 31829.10k 31839.55k The illegal instruction ones for aes-256-gcm were: # env OPENSSL_armcap=4 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core dumped) env OPENSSL_armcap=32 openssl speed -evp aes-256-gcm Doing aes-256-gcm for 3s on 16 size blocks: Illegal instruction (core dumped) (sha256 does not match for what is illegal.) Ignoring the illegal-instruction producing bits, HWCAP_FP mixed with any one of the other bits was also similarly slow. As for all the non-illegal-instruction producing bits: also similarly slow: # env OPENSSL_armcap=219 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-gcm 28922.63k 30711.51k 31522.15k 31722.15k 31788.97k 31845.03k Disabling just HWCAP_FP from that got the fast category of result: # env OPENSSL_armcap=218 openssl speed -evp aes-256-gcm . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-gcm 49543.14k 58068.22k 60236.56k 60724.37k 61216.09k 61212.99k As for sha256 . . . # env OPENSSL_armcap=0 openssl speed -evp sha256 . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha256 22434.19k 59895.91k 117258.16k 156264.31k 172624.81k 173848.52k (I'll not list all the similar performing ones but will list all illegal-instruction producing ones.) # env OPENSSL_armcap=4 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: 4082055 sha256's in 2.99s Doing sha256 for 3s on 64 size blocks: 2752520 sha256's in 3.02s Doing sha256 for 3s on 256 size blocks: 1372584 sha256's in 3.03s Doing sha256 for 3s on 1024 size blocks: 470215 sha256's in 3.11s Doing sha256 for 3s on 8192 size blocks: 64700 sha256's in 3.07s Doing sha256 for 3s on 16384 size blocks: 31847 sha256's in 3.00s Illegal instruction (core dumped) # env OPENSSL_armcap=16 openssl speed -evp sha256 Doing sha256 for 3s on 16 size blocks: Illegal instruction (core dumped) (16 worked for aes-256-gcm but 32 did not.) So: no significantly slower examples of single enabled bit cases. No (non-illegal-instruction) 2-enabled-bits examples were dissimilar for the speed. For reference (avoiding illegal-instructions): # env OPENSSL_armcap=235 openssl speed -evp sha256 . . . type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sha256 23185.66k 62689.73k 125814.72k 167981.88k 187833.65k 188968.95k So: also similar speed. Need any other specific bit combinations? === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)