ChaCha8/12/20 and GEOM ELI tests
Alexey Ivanov
savetherbtz at gmail.com
Tue Jan 13 03:40:23 UTC 2015
Just curious: why does a stream cipher use mode of operation (e.g. XTS)?
> On Jan 12, 2015, at 3:34 PM, John-Mark Gurney <jmg at funkthat.com> wrote:
>
> rozhuk.im at gmail.com wrote this message on Mon, Jan 12, 2015 at 23:40 +0300:
>>>> Cha?ha patch:
>>>>
>>> http://netlab.linkpc.net/download/software/FreeBSD/patches/chacha.patch
>>>
>>> What's the difference between CHACHA and XCHACHA?
>>
>> Same as between SALSA and XSALSA.
>>
>> XChaCha20 uses a 256-bit key as well as the first 128 bits of the nonce in
>> order to compute a subkey. This subkey, as well as the remaining 64 bits of
>> the nonce, are the parameters of the ChaCha20 function used to actually
>> generate the stream.
>>
>> But with XChaCha20's longer nonce, it is safe to generate nonces using
>> randombytes_buf() for every message encrypted with the same key without
>> having to worry about a collision.
>>
>> More details: http://cr.yp.to/snuffle/xsalsa-20081128.pdf
>
> Ahh, thanks..
>
>>> Also, where are the man page diffs? They might have explained the
>>> difference between the two, and explained why two versions of chacha
>>> are needed...
>>
>> No man page diffs.
>
> You need to document the new defines in crypto(9), and document the
> various parameters in crypto(7)... Yes, not all modes are documented
> in crypto(7), but going forward, at a minimum we need to document new
> additions...
>
> I'll admit I didn't document the other algorithms as I'm not as familar
> w/ those as the ones that I worked one...
>
>> Man pages does not explain difference between AES-CBC and AES-XTS...
>
> True, but CBC and XTS (which includes a reference to the standard) are
> a lot more searchable/common knowlege than xchacha.. google thinks you
> mean chacha, and xchacha just turns up a bunch of people on various
> networks... Not until you search on xchacha crypto do you get a relevant
> page... Also, wikipedia doesn't have an entry for xchacha, nor does
> the chacha (cipher) page list it... So, when documenting xchacha in
> crypto(7), include a link to the description/standard...
>
>>> Is there a reason you decided to write your own ChaCha implementation
>>> instead of using one of the standard ones? Did you run performance
>>> tests between your implementation and others?
>>
>> Reference ChaCha and reference (FreeBSD) XTS (4k sector):
>> ChaCha8-XTS-256 = 199518722 bytes/sec
>> ChaCha12-XTS-256 = 179029849 bytes/sec
>> ChaCha20-XTS-256 = 149447317 bytes/sec
>> XChaCha8-XTS-256 = 195675728 bytes/sec
>> XChaCha12-XTS-256 = 175790196 bytes/sec
>> XChaCha20-XTS-256 = 147939263 bytes/sec
>
> So, you're seeing a 33%-50% improvement, good to hear...
>
> Also, do you publish this implementation somewhere? If so, it'd be
> helpful to include a url to where up to date versions can be obtained...
> If you don't plan on publishing/maintaining it outside of FreeBSD, then
> we need to unifdef out the Windows parts of it for our tree...
>
>> This is the reference version adapted for use in /dev/crypto.
>> chacha_block_unaligneg() - processing the reference version of a data block.
>> Macros are used for readability.
>> chacha_block_aligned() - the same but the work on the aligned data.
>
> Please use the macro __NO_STRICT_ALIGNMENT to decide if special work
> is necessary to handle the alignment...
>
> What is the CHACHA_X64 macro for? If that is to detect LP64 platforms,
> please use the macro __LP64__ to decide this... Have you done
> performance evaluations on 32bit arches to make sure double rounds aren't
> a benefit there too?
>
> Use the byteorder(9) macros to encode/decode integers instead of rolling
> your own (U8TO32_LITTLE and U32TO8_LITTLE)... Turns out compilers aren't
> good at optimizing this type of code, and platforms may have assembly
> optimized versions for these...
>
>> To increase speed, instead of one byte is processed for 4/8 byte times.
>> The data in the context of an 8-byte aligned.
>> To increase security, all data, including temporary, saved in a context that
>> on completion of the work is filled with zeros.
>
> Please use the function explicite_bzero that is available for all of
> these instead of creating your own..
>
>>>> HW: Core Duo E8500, 8Gb DDR2-800.
>>>> dd if=/dev/zero of=/dev/md0 bs=1m
>>>> 2148489421 bytes/sec
>>>>
>>>>
>>>> # sector = 512b
>>>> 3DES-CBC-192 = 20773120 bytes/sec
>>>> AES-CBC-128 = 85276853 bytes/sec
>>>> AES-CBC-256 = 68893016 bytes/sec
>>>> AES-XTS-128 = 68194868 bytes/sec
>>>> AES-XTS-256 = 56611573 bytes/sec
>>>> Blowfish-CBC-128 = 11169657 bytes/sec
>>>> Blowfish-CBC-256 = 11185891 bytes/sec
>>>> Camellia-CBC-128 = 78077243 bytes/sec
>>>> Camellia-CBC-256 = 65732219 bytes/sec
>>>> ChaCha8-XTS-256 = 258042765 bytes/sec
>>>> ChaCha12-XTS-256 = 223616967 bytes/sec
>>>> ChaCha20-XTS-256 = 176005366 bytes/sec
>>>> XChaCha8-XTS-256 = 228292624 bytes/sec
>>>> XChaCha12-XTS-256 = 195577624 bytes/sec
>>>> XChaCha20-XTS-256 = 152247267 bytes/sec
>>>> XChaCha20-XTS-128 = 152717737 bytes/sec ! 128 bit key have same speed
>>>> as 256
>>>>
>>>>
>>>> # sector = 4kb
>>>> 3DES-CBC-192 = 22018189 bytes/sec
>>>> AES-CBC-128 = 104097143 bytes/sec
>>>> AES-CBC-256 = 81983833 bytes/sec
>>>> AES-XTS-128 = 78559346 bytes/sec
>>>> AES-XTS-256 = 66047200 bytes/sec
>>>> Blowfish-CBC-128 = 38635464 bytes/sec
>>>> Blowfish-CBC-256 = 38810555 bytes/sec
>>>> Camellia-CBC-128 = 92814510 bytes/sec
>>>> Camellia-CBC-256 = 75949489 bytes/sec
>>>> ChaCha8-XTS-256 = 337336982 bytes/sec
>>>> ChaCha12-XTS-256 = 284740187 bytes/sec
>>>> ChaCha20-XTS-256 = 217326865 bytes/sec
>>>> XChaCha8-XTS-256 = 328424551 bytes/sec
>>>> XChaCha12-XTS-256 = 278579692 bytes/sec
>>>> XChaCha20-XTS-256 = 211660225 bytes/sec
>>>>
>>>> Optimized AES-XTS - speed like AES-CBC:
>>>> AES-XTS-128 = 102841051 bytes/sec
>>>> AES-XTS-256 = 80813644 bytes/sec
>>>
>>> Is this from a different patch or what? Can you talk more about this?
>>
>> No patch at this moment.
>> After optimization ChaCha-XTS I applied these optimizations to the AES-XTS
>> and get this result.
>> All changes were aes_xts_reinit() and aes_xts_crypt(), just slightly changed
>> the structure aes_xts_ctx.
>>
>> aes_xts_ctx:
>> u_int8_t tweak[] -> u_int64_t tweak[]
>>
>> aes_xts_reinit -> same as chacha_xts_reinit()
>>
>> aes_xts_crypt -> same as chacha_xts_crypt():
>> block[] - temp buf removed;
>> xor 1 byte -> xor 8 bytes at once;
>> tweak[i] << 1: rotl 1 bit: 1 byte -> 8 bytes;
>> unroll loops;
>
> Ahh, I thought I had done some similar optimizations, but I only did
> them to the aesni version of the routines... You should use the macro
> above to decide if things are aligned or not...
>
>>
>> Final:
>>
>> struct aes_xts_ctx {
>> rijndael_ctx key1;
>> rijndael_ctx key2;
>> uint64_t tweak[(AES_XTS_BLOCKSIZE / sizeof(uint64_t))];
>> };
>>
>> void
>> aes_xts_reinit(caddr_t key, u_int8_t *iv)
>> {
>> struct aes_xts_ctx *ctx = (struct aes_xts_ctx *)key;
>>
>> /*
>> * Prepare tweak as E_k2(IV). IV is specified as LE representation
>> * of a 64-bit block number which we allow to be passed in directly.
>> */
>> if (ALIGNED_POINTER(iv, uint64_t)) {
>> ctx->tweak[0] = (*((uint64_t*)(void*)iv));
>> } else {
>> bcopy(iv, ctx->tweak, sizeof(uint64_t));
>> }
>> /* Convert to LE. */
>> ctx->tweak[0] = htole64(ctx->tweak[0]);
>
> Hmm... this line bothers me.. I'll need to spend more time reading up
> to decide if it is buggy or not... Is ctx->tweak in host order? or LE
> order? I believe it's suppose to be LE order, as it gets passed
> directly to _encryt.. I'm also not sure if the original code is BE
> clean, which is part of my problem...
>
>> /* Last 64 bits of IV are always zero */
>> ctx->tweak[1] = 0;
>>
>> rijndael_encrypt(&ctx->key2, (uint8_t*)ctx->tweak,
>> (uint8_t*)ctx->tweak);
>> }
>>
>> static void
>> aes_xts_crypt(struct aes_xts_ctx *ctx, u_int8_t *data, u_int do_encrypt)
>> {
>> size_t i;
>> uint64_t crr, tm;
>>
>> if (ALIGNED_POINTER(blk, uint64_t)) {
>> ((uint64_t*)(void*)data)[0] ^= ctx->tweak[0];
>> ((uint64_t*)(void*)data)[1] ^= ctx->tweak[1];
>> } else {
>> for (i = 0; i < AES_XTS_BLOCKSIZE; i ++)
>> data[i] ^= ((uint8_t*)ctx->tweak)[i];
>> }
>>
>> if (do_encrypt)
>> rijndael_encrypt(&ctx->key1, data, data);
>> else
>> rijndael_decrypt(&ctx->key1, data, data);
>>
>> if (ALIGNED_POINTER(blk, uint64_t)) {
>> ((uint64_t*)(void*)data)[0] ^= ctx->tweak[0];
>> ((uint64_t*)(void*)data)[1] ^= ctx->tweak[1];
>> } else {
>> for (i = 0; i < AES_XTS_BLOCKSIZE; i ++)
>> data[i] ^= ((uint8_t*)ctx->tweak)[i];
>> }
>>
>> /* Exponentiate tweak */
>> crr = (ctx->tweak[0] >> ((sizeof(uint64_t) * 8) - 1));
>> ctx->tweak[0] = (ctx->tweak[0] << 1);
>>
>> tm = ctx->tweak[1];
>> ctx->tweak[1] = ((tm << 1) | crr);
>> crr = (tm >> ((sizeof(uint64_t) * 8) - 1));
>>
>> if (crr)
>> ctx->tweak[0] ^= 0x87; /* GF(2^128) generator polynomial. */
>
> Please use the AES_XTS_ALPHA define instead of hardcoding the value..
>
> Thanks.
>
> --
> John-Mark Gurney Voice: +1 415 225 5579
>
> "All that I will do, has been done, All that I have, has not."
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20150112/6af6d459/attachment.sig>
More information about the freebsd-hackers
mailing list