Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features

From: Mark Millard via arm <arm_at_freebsd.org>
Date: Wed, 24 Nov 2021 09:51:52 UTC
[Actually, the main [so: 14] equivalent.]

All Cortex-A72 based . . .

First, older system versions (before that update)
then after the update:


RPi4B 8 GiByte (older FreeBSD first, otherwise new),
Cortex-A72's:

# openssl speed -evp aes-256-gcm
. . .
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm      51925.92k    58449.46k    60430.32k    61050.13k    61180.98k    61482.75k

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm      28880.07k    30837.33k    31630.29k    31855.62k    31921.54k    32034.53k

So: slowed down, unlike the other examples below.

# env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm
. . .
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm      51894.33k    58540.45k    60815.22k    61534.47k    61906.84k    62042.10k

So: back to the prior speed.

But all these are based on config.txt containing:

over_voltage=6 
arm_freq=2000 
sdram_freq_min=3200 
force_turbo=1

(The RPi4B has a heat-sink and a fan.)

Note: See later about the RPi4B CPU features.


MACCHIATObin Double Shot (older first), Cortex-A72's:

# openssl speed -evp aes-256-gcm
. . .
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm      50808.49k    58466.08k    60769.11k    61444.92k    61767.94k    61707.61k

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm     163579.14k   456319.27k   786544.01k   940234.41k  1003230.55k  1005671.31k


HoneyComb (older first), Cortex-A782's:

# openssl speed -evp aes-256-gcm
. . .
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm      57659.60k    64599.05k    67719.81k    68373.74k    68724.24k    68793.80k

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm     177925.57k   502311.65k   866287.95k  1036500.35k  1106598.06k  1106721.91k

Rock64 (older first), Cortex-A53's:

# openssl speed -evp aes-256-gcm
. . .
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm      18378.23k    23401.45k    24834.99k    25206.10k    25337.86k    25258.19k

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm      52711.29k   163586.49k   318738.69k   420277.93k   461373.44k   463192.06k


OPi+2E (older first), Cortex-A7's (so armv7):

# openssl speed -evp aes-256-gcm
. . .
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm       9343.10k    11156.39k    11827.64k    11995.30k    12025.86k    12031.32k

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-gcm      11013.41k    13598.44k    14034.26k    15045.97k    15262.90k    15302.66k



For reference:

For the RPi4B examples (2 notes added):

CPU  0: ARM Cortex-A72 r0p3 affinity:  0
                   Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG>
 Instruction Set Attributes 0 = <CRC32>
*** NOTE the lack of ",SHA2,SHA1,AES+PMULL" above ***
 Instruction Set Attributes 1 = <>
         Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32>
         Processor Features 1 = <>
      Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA>
      Memory Model Features 1 = <8bit VMID>
      Memory Model Features 2 = <32bit CCIDX,48bit VA>
             Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
             Debug Features 1 = <>
         Auxiliary Features 0 = <>
         Auxiliary Features 1 = <>
AArch32 Instruction Set Attributes 5 = <CRC32,SEVL>
*** NOTE the lack of ",SHA2,SHA1,AES+VMULL" above ***
AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD>
AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ>

For the MACCHIATObin Double Shot examples:

CPU  0: ARM Cortex-A72 r0p1 affinity:  0  0
                   Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG>
 Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL>
 Instruction Set Attributes 1 = <>
         Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32>
         Processor Features 1 = <>
      Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA>
      Memory Model Features 1 = <8bit VMID>
      Memory Model Features 2 = <32bit CCIDX,48bit VA>
             Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
             Debug Features 1 = <>
         Auxiliary Features 0 = <>
         Auxiliary Features 1 = <>
AArch32 Instruction Set Attributes 5 = <CRC32,SHA2,SHA1,AES+VMULL,SEVL>
AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD>
AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ>


For the HoneyComb examples:

CPU  0: ARM Cortex-A72 r0p3 affinity:  0  0
                   Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG>
 Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL>
 Instruction Set Attributes 1 = <>
         Processor Features 0 = <GIC,AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32>
         Processor Features 1 = <>
      Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA>
      Memory Model Features 1 = <8bit VMID>
      Memory Model Features 2 = <32bit CCIDX,48bit VA>
             Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
             Debug Features 1 = <>
         Auxiliary Features 0 = <>
         Auxiliary Features 1 = <>
AArch32 Instruction Set Attributes 5 = <CRC32,SHA2,SHA1,AES+VMULL,SEVL>
AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD>
AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ>




For the Rock64 examples:

CPU  0: ARM Cortex-A53 r0p4 affinity:  0
                   Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG>
 Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL>
 Instruction Set Attributes 1 = <>
         Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32>
         Processor Features 1 = <>
      Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,1TB PA>
      Memory Model Features 1 = <8bit VMID>
      Memory Model Features 2 = <32bit CCIDX,48bit VA>
             Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
             Debug Features 1 = <>
         Auxiliary Features 0 = <>
         Auxiliary Features 1 = <>
AArch32 Instruction Set Attributes 5 = <CRC32,SHA2,SHA1,AES+VMULL,SEVL>
AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD>
AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ>
C


For the OPi+2E examples:

CPU: ARM Cortex-A7 r0p5 (ECO: 0x00000000)
CPU Features: 
  Multiprocessing, Thumb2, Security, Virtualization, Generic Timer, VMSAv7,
  PXN, LPAE, Coherent Walk
Optional instructions: 
  SDIV/UDIV, UMULL, SMULL, SIMD(ext)
LoUU:2 LoC:3 LoUIS:2 
Cache level 1:
 32KB/64B 4-way data cache WB Read-Alloc Write-Alloc
 32KB/32B 2-way instruction cache Read-Alloc
Cache level 2:
 512KB/64B 8-way unified cache WB Read-Alloc Write-Alloc

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)