Re: Implementing in-kernel AES crypto acceleration on ppc (POWER8+)
Date: Fri, 04 Aug 2023 16:13:42 UTC
On 8/4/23 9:55 AM, John Baldwin wrote: > On 8/4/23 6:36 AM, Justin Hibbits wrote: >> Hello, >> >> Good to see this! I'll answer inline. >> >> On Thu, 3 Aug 2023 12:51:57 -0500 >> Shawn Anastasio <sanastasio@raptorengineering.com> wrote: >> >>> Hello all, >>> >>> Raptor Engineering is interested in adding support for in-kernel AES >>> acceleration on ppc64 via the VMX crypto instructions added in ISA >>> 2.07B, and I wanted to reach out to the community with a few >>> questions. >> >> I would love to see this added. >> >>> >>> 1. As I understand it, FreeBSD already has support for in-kernel >>> crypto acceleration on x86 and ARM via the aesni and armv8_crypto >>> drivers respectively that each implement the cryptodev interface. Am I >>> correct in understanding that adding AES acceleration for Power >>> would just involve creating another driver here, or are there other >>> pieces of the puzzle that I've missed? >> >> John Baldwin can probably answer this better, but I think your >> understanding is correct. There might be some plumbing needed as well, >> but that should be minimal. > > Recently for accelerated software crypto we have been using the existing > assembly routines from OpenSSL (which has ppc routines IIRC) in the > ossl(4) driver. For powerpc you would need to provide any ppc-specific > things the OpenSSL assembly routines need (e.g. on x86 they use an array > of words holding feature bits corresponding to output from cpuid), and > mostly just add build glue. This driver also requires fpu_kern_*, but > in general ossl(4) is preferred going forward and will eventually > replace aesni and armv8crypto entirely. Thank you for pointing this out. It seems that sys/crypto/openssl conveniently already has the relevant ppc assembly files imported from the openssl source tree, so it looks like I'll just have to implement the ppc-specific openssl glue, in addition to the fpu_enter/leave functions. >>> 2. I see that both the aesni and armv8 drivers make use of the >>> fpu_kern_enter/fpu_kern_leave functions to guard access to vector >>> registers, but it appears that these functions aren't implemented >>> on ppc. Is that correct, or does an in-kernel facility for safely >>> accessing vector registers on ppc already exist? >> >> Nope, ppc doesn't have these facilities yet. It shouldn't be hard to >> implement, we just haven't done it yet. If you're interested in >> implementing them, you should be able to model it after arm64, largely. > > Yes, this is a prerequisite. If you implement this you can also enable > assembly for ZFS on powerpc as well. Makes sense, thank you both for the confirmation. I'll get started on mocking up an implementation of these functions. >>> 3. For the accelerated AES implementation itself, I've noticed that >>> cryptogams[*] contains an implementation that is both widely >>> deployed (and thus tested and likely to be correct) and also BSD >>> licensed. Would it be acceptable to import the relevant routines to >>> the FreeBSD kernel and have the new cryptodev driver simply call into >>> them, or are there other considerations involved? >> >> I think the right way to do that would be to import the code as-is as >> third party code, and call into the routines that you need. You can >> #ifdef out the unneeded bits, but try to keep it as intact as possible >> from upstream. > > As mentioned above, I would prefer using the OpenSSL sources already > in the tree via ossl(4). > Sounds good. >>> 4. Is there a userspace test framework for the cryptodev API that >>> could be used to validate and benchmark the new implementation, or >>> would I have to write that myself? It appears that OpenSSL had >>> support for /dev/crypto at one point, but I'm not sure that is the >>> case any longer. >> >> John Baldwin might have some ideas here, too. > > There is a cryptocheck tool in tools/tools/crypto that I tend to use. > It generates "random" but deterministic (since it uses a fixed seed > for libc's PRNG) tests of most of the algorithms supported by the > cryptodev API with various key sizes, nonce sizes, payload, and AAD > sizes. It performs each test once with OpenSSL's userland crypto > and a second time with /dev/crypto and compares the results reporting > any mismatches. There isn't a manpage, but there are some comments > in the source. > > There are also some tests in the test suite that make use of the > NIST known answer test vectors (which have to be installed via a nist-kat > package). Perfect, this seems like exactly what I was looking for. I'll begin work on the fpu_enter/leave functions and keep you all updated. Thank you both for the prompt and detailed responses! - Shawn