Re: git: 32a2fed6e71f - stable/13 - openssl: Fix detection of ARMv7 and ARM64 CPU features
- In reply to: deleted: "deleted (X-No-Archive)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 25 Nov 2021 15:09:25 UTC
On 11/25/2021 2:38 AM, Helge Oldach wrote: > Hi, > > Allan Jude wrote on Wed, 24 Nov 2021 19:02:47 +0100 (CET): >> On 11/24/2021 3:30 AM, Emmanuel Vadot wrote: >>> On Tue, 23 Nov 2021 20:36:40 +0100 (CET) >>> freebsd@oldach.net (Helge Oldach) wrote: >>> >>>> Allan Jude wrote on Tue, 23 Nov 2021 20:14:53 +0100 (CET): >>>>> On 11/23/2021 5:00 AM, Helge Oldach wrote: >>>>>> Allan Jude wrote on Mon, 22 Nov 2021 19:14:13 +0100 (CET): >>>>>> Hmmm. On a RPi4/8G: >>>>>> >>>>>> Before (FreeBSD 13.0-STABLE (GENERIC) #366 stable/13-n248173-d16fbc488e6): >>>>>> | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes >>>>>> | aes-256-gcm 35791.98k 38533.57k 39986.77k 41397.59k 39840.43k 39638.36k >>>>>> >>>>>> After (FreeBSD 13.0-STABLE (GENERIC) #367 stable/13-n248176-f085bb0e621) >>>>>> >>>>>> | type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes >>>>>> | aes-256-gcm 21277.62k 23226.64k 23613.90k 23687.51k 23892.93k 23947.95k >>>>>> >>>>>> It seems that AES throughput is actually cut by almost half? >>>>> >>>>> Do you know which of the CPU optimizations your RPi4 supports? >>>> >>>> Is this what you need? >>>> >>>> Instruction Set Attributes 0 = <CRC32> >>> >>> So there is no AES+PMULL instruction set on RPI4, I guess that openssl >>> uses them for aes-gcm. >>> >>> I wonder what it uses before that make it have this boost. >>> >>> On my rockpro64 I do see the improvement btw : >>> root@generic:~ # cpuset -l 4,5 openssl speed -evp aes-256-gcm >>> ... >>> aes-256-gcm 122861.59k 337938.39k 565408.44k 661223.09k 709175.19k 712327.25k >>> root@generic:~ # cpuset -l 4,5 env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm >>> ... >>> aes-256-gcm 34068.11k 38068.62k 39435.24k 39818.75k 39905.34k 39922.35k >>> >>> Running on the big cores at max freq. >>> >>>> Instruction Set Attributes 1 = <> >>>> Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> >>>> Processor Features 1 = <> >>>> Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,16TB PA> >>>> Memory Model Features 1 = <8bit VMID> >>>> Memory Model Features 2 = <32bit CCIDX,48bit VA> >>>> Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> >>>> Debug Features 1 = <> >>>> Auxiliary Features 0 = <> >>>> Auxiliary Features 1 = <> >>>> AArch32 Instruction Set Attributes 5 = <CRC32,SEVL> >>>> AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> >>>> AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> >>>> >>>>> You can set the environment variable OPENSSL_armcap to override >>>>> OpenSSL's detection. >>>>> >>>>> Try: env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm >>>> >>>> On FreeBSD 13.0-STABLE (GENERIC) #367 stable/13-n248176-f085bb0e621 again (i.e. after this commit): >>>> >>>> hmo@p48 ~ $ env OPENSSL_armcap=0 openssl speed -evp aes-256-gcm >>>> Doing aes-256-gcm for 3s on 16 size blocks: 6445704 aes-256-gcm's in 3.08s >>>> Doing aes-256-gcm for 3s on 64 size blocks: 1861149 aes-256-gcm's in 3.00s >>>> Doing aes-256-gcm for 3s on 256 size blocks: 479664 aes-256-gcm's in 3.01s >>>> Doing aes-256-gcm for 3s on 1024 size blocks: 122853 aes-256-gcm's in 3.04s >>>> Doing aes-256-gcm for 3s on 8192 size blocks: 15181 aes-256-gcm's in 3.00s >>>> Doing aes-256-gcm for 3s on 16384 size blocks: 7796 aes-256-gcm's in 3.07s >>>> OpenSSL 1.1.1l-freebsd 24 Aug 2021 >>>> built on: reproducible build, date unspecified >>>> options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) >>>> compiler: clang >>>> The 'numbers' are in 1000s of bytes per second processed. >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes >>>> aes-256-gcm 33504.57k 39704.51k 40825.01k 41394.83k 41454.25k 41601.52k >>>> hmo@p48 ~ $ openssl speed -evp aes-256-gcm >>>> Doing aes-256-gcm for 3s on 16 size blocks: 4066201 aes-256-gcm's in 3.00s >>>> Doing aes-256-gcm for 3s on 64 size blocks: 1087387 aes-256-gcm's in 3.00s >>>> Doing aes-256-gcm for 3s on 256 size blocks: 280110 aes-256-gcm's in 3.03s >>>> Doing aes-256-gcm for 3s on 1024 size blocks: 70412 aes-256-gcm's in 3.04s >>>> Doing aes-256-gcm for 3s on 8192 size blocks: 8762 aes-256-gcm's in 3.00s >>>> Doing aes-256-gcm for 3s on 16384 size blocks: 4402 aes-256-gcm's in 3.02s >>>> OpenSSL 1.1.1l-freebsd 24 Aug 2021 >>>> built on: reproducible build, date unspecified >>>> options:bn(64,64) rc4(int) des(int) aes(partial) idea(int) blowfish(ptr) >>>> compiler: clang >>>> The 'numbers' are in 1000s of bytes per second processed. >>>> type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes >>>> aes-256-gcm 21686.41k 23197.59k 23656.30k 23725.04k 23926.10k 23916.23k >>>> hmo@p48 ~ $ >>>> >>>> Kind regards, >>>> Helge >>> >>> >> >> So based on results from Manu, and Mark Millard, it seems almost every >> ARM platform is faster when it takes advantage of the CPU features, >> except the RPi4(B). >> >> As Manu pointed out, it doesn't appear to have the AES+PMULL feature, >> which means it must be something else that is slowing it down. >> >> What might help, is to try each feature in turn, and figure out which >> one is causing slower results. >> >> #define HWCAP_FP 0x00000001 >> #define HWCAP_ASIMD 0x00000002 >> #define HWCAP_EVTSTRM 0x00000004 >> #define HWCAP_AES 0x00000008 >> #define HWCAP_PMULL 0x00000010 >> #define HWCAP_SHA1 0x00000020 >> #define HWCAP_SHA2 0x00000040 >> #define HWCAP_CRC32 0x00000080 >> >> So try: >> env OPENSSL_armcap=1 openssl speed -evp aes-256-gcm >> as well as with armcap=2, 3 (both FP and ASIMD), 8 (just AES) etc. > > hmo@p48 ~ $ for f in 0 1 2 3 8 16 32 64 128 ; do echo -n $f:; env OPENSSL_armcap=$f openssl speed -evp aes-256-gcm 2>&1 | tail -1 | cut -wf7; done > 0:42295.15k > 1:23891.19k > 2:42208.57k > 3:23970.56k > 8:42354.98k > 16:42199.06k > 32:size > Illegal instruction (core dumped) > 64:42322.42k > 128:42275.00k > hmo@p48 ~ $ > > So I guess HWCAP_FP is the culprit? Maybe related to hard/soft floating > point math which indeed is kind of special on the Pi? > >> For ones where the CPU lacks the feature, it will crash with 'Illegal >> instruction' >> >> Separately, it might also be interesting to see the results of `openssl >> speed -evp sha256` before/after/with the different OPENSSL_armcap values > > Please let me know in case you still require this. > > Kind regards > Helge > So yeah, the issue seems to be that floating point on the RPi4 is slower than not, but now openssl (properly) detects that the CPU advertises support for it. As seen elsewhere in the thread, most other ARM platforms get a very significant speed boost. -- Allan Jude