using SSE2 in kernel C code (improving AES-NI module)

Konstantin Belousov kostikbel at gmail.com
Sat Oct 20 18:18:33 UTC 2012


On Sat, Oct 20, 2012 at 11:10:37AM -0700, Peter Wemm wrote:
> On Sat, Oct 20, 2012 at 10:11 AM, John-Mark Gurney <jmg at funkthat.com> wrote:
> > Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0300:
> >> On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote:
> >> > So, the AES-NI module already uses SSE2 instructions, but it does so
> >> > only in assembly.  I have improved the performance of the AES-NI
> >> > modules implementation, but this involves me using additional SSE2
> >> > instructions.
> >> >
> >> > In order to keep my sanity, I did part of the new code in C using
> >> > gcc native types and xmmintrin.h, but we do not support this header in
> >> > the kernel..  This means we cannot simply add the new code to the
> >> > kernel...
> >> >
> >> > Any good ideas on how to integrate this code into the kernel build?
> >
> > [...]
> >
> >>
> >> The current structure of the aes-ni driver is partly enforced by the
> >> issue you noted. We cannot use sse intristics in the kernel, and
> >> huge inline assembler fragments are hard to write.
> >>
> >> I prefer to have the separate .S files with the optimized code,
> >> hand-written. If needed, I offer you a help with transition. I would
> >> need a full patch to rewrite the code.
> >
> > Are you sure you want to do this?  It'll involve writing around 500
> > lines of assembly besides the constants... And it isn't simple like
> > the aesni_enc where we have a single loop for the rounds...  I've
> > posted a tar.gz to overlay onto sys/crypto/aesni at:
> > https://www.funkthat.com/~jmg/aesni.repfile.tar.gz
> 
> Rather than go straight to assembler, why not use the __builtins?
> 
> static inline __m128i
> xts_crank_lfsr(__m128i inp)
> {
>         const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA);
>         __m128i xtweak, ret;
> 
>         /* set up xor mask */
>         xtweak = _mm_shuffle_epi32(inp, 0x93);
>         xtweak = _mm_srai_epi32(xtweak, 31);
>         xtweak &= alphamask;
> 
>         /* next term */
>         ret = _mm_slli_epi32(inp, 1);
>         ret ^= xtweak;
> 
>         return ret;
> }
> 
> -->
> 
> static inline __m128i
> xts_crank_lfsr(__m128i inp)
> {
>         const __m128i alphamask = (magic casts){ 1, 1, 1, AES_XTS_ALPHA };
>         __m128i xtweak, ret;
> 
>         /* set up xor mask */
>         xtweak = __builtin_ia32_pshufd (inp, 0x93);
>         xtweak = __builtin_ia32_psradi128(xtweak, 31);
>         xtweak &= alphamask;
> 
>         /* next term */
>         ret = __builtin_ia32_pslldi128(inp, 1);
>         ret ^= xtweak;
> 
>         return ret;
> }
> I know I skipped the details like data types, but most of the meat of
> those functions collapses to a simple wrapper around a __builtin.
Are builtins available for -mno-sse compilation ?

I think we can try to reimplement the builtins needed with inline
assembly.
> 
> Or, another option.. do something like genassym or the many other
> kernel build tools.  aicasm builds and runs a userland tool to
> generate something to build into the kernel.  With sufficient
> cross-contamination safeguards I wonder if something similar might be
> able to be done here.
> 
> -- 
> Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
> "All of this is for nothing if we don't go to the stars" - JMS/B5
> "If Java had true garbage collection, most programs would delete
> themselves upon execution." -- Robert Sewell
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20121020/0f68a48c/attachment.sig>


More information about the freebsd-arch mailing list