Increasing GELI performance
Tom Evans
tevans.uk at googlemail.com
Tue Jul 31 09:10:12 UTC 2007
On Sat, 2007-07-28 at 14:26 +0100, Dominic Bishop wrote:
> I've just been testing out GELI performance on an underlying RAID using a
> 3ware 9550SXU-12 running RELENG_6 as of yesterday and seem to be hitting a
> performance bottleneck, but I can't see where it is coming from.
>
> Testing with an unencrypted 100GB GPT partition (/dev/da0p1) gives me around
> 200-250MB/s read and write speeds to give an idea of the capability of the
> disk device itself.
>
> Using GELI with a default 128bit AES key seems to limit at ~50MB/s ,
> changing the sector size all the way upto 128KB makes no difference
> whatsoever to the performance. If I use the threads sysctl in loader.conf
> and drop the geli threads to 1 thread only (instead of the usual 3 it spawns
> on this system) the performance still does not change at all. Monitoring
> during writes with systat confirms that it really is spawning 1 or 3 threads
> correctly in these cases.
>
> Here is a uname -a from the machine
>
> FreeBSD 004 6.2-STABLE FreeBSD 6.2-STABLE #2: Fri Jul 27 20:10:05 CEST 2007
> dom at 004:/u1/obj/u1/src/sys/004 amd64
>
> Kernel is a copy of GENERIC with GELI option added
>
> Encrypted partition created using : geli init -s 65536 /dev/da0p1
>
> Simple write test done with: dd if=/dev/zero of=/dev/da0p1.eli bs=1m
> count=10000 (same as I did on the unencyrpted, a full test with bonnie++
> shows similar speeds)
>
> Systat output whilst writing, showing 3 threads:
>
>
> /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
> Load Average ||||
>
> /0 /10 /20 /30 /40 /50 /60 /70 /80 /90 /100
> root idle: cpu3 XXXXXXXXX
> root idle: cpu1 XXXXXXXX
> <idle> XXXXXXXX
> root idle: cpu0 XXXXXXX
> root idle: cpu2 XXXXXX
> root g_eli[2] d XXX
> root g_eli[0] d XXX
> root g_eli[1] d X
> root g_up
> root dd
>
> Output from vmstat -w 5
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr ad4 da0 in sy cs us sy
> id
> 0 1 0 38124 3924428 208 0 1 0 9052 0 0 0 1758 451 6354 1
> 15 84
> 0 1 0 38124 3924428 0 0 0 0 13642 0 0 411 2613 128 9483 0
> 22 78
> 0 1 0 38124 3924428 0 0 0 0 13649 0 0 411 2614 130 9483 0
> 22 78
> 0 1 0 38124 3924428 0 0 0 0 13642 0 0 411 2612 128 9477 0
> 22 78
> 0 1 0 38124 3924428 0 0 0 0 13642 0 0 411 2611 128 9474 0
> 23 77
>
> Output from iostat -x 5
> extended device statistics
> device r/s w/s kr/s kw/s wait svc_t %b
> ad4 2.2 0.7 31.6 8.1 0 3.4 1
> da0 0.2 287.8 2.3 36841.5 0 0.4 10
> pass0 0.0 0.0 0.0 0.0 0 0.0 0
> extended device statistics
> device r/s w/s kr/s kw/s wait svc_t %b
> ad4 0.0 0.0 0.0 0.0 0 0.0 0
> da0 0.0 411.1 0.0 52622.1 0 0.4 15
> pass0 0.0 0.0 0.0 0.0 0 0.0 0
> extended device statistics
> device r/s w/s kr/s kw/s wait svc_t %b
> ad4 0.0 0.0 0.0 0.0 0 0.0 0
> da0 0.0 411.1 0.0 52616.2 0 0.4 15
> pass0 0.0 0.0 0.0 0.0 0 0.0 0
>
>
> Looking at these results myself I cannot see where the bottleneck is, I
> would assume since changing the sector size or the geli threads doesn't
> affect performance that there is some other single threaded part limiting it
> but I don't know enough about how it works to say what.
>
> CPU in the machine is a pair of these:
> CPU: Intel(R) Xeon(R) CPU 5110 @ 1.60GHz (1603.92-MHz K8-class
> CPU)
>
> I've also come across some other strange issues with some other machines
> which have identical arrays but only a pair of 32bit 3.0Ghz xeons in them
> (Also using releng_6 as of yesterday, just i386 not amd64). On those geli
> will launch a single thread by default (cores-1 seems to be the default)
> however I cannot force it to launch 2 by using the sysctl, although on the 4
> core machine I can successfully use it to launch 4. It would be nice to be
> able to use both cores on the 32bit machines for geli but given the results
> I've shown here I'm not sure it would gain me much at the moment.
>
> Another problem I've found is that if I use a sector size for GELI > 8192
> bytes then I'm unable to newfs the encrypted partition afterwards, it fails
> immediately with this error:
>
> newfs /dev/da0p1.eli
> increasing block size from 16384 to fragment size (65536)
> /dev/da0p1.eli: 62499.9MB (127999872 sectors) block size 65536, fragment
> size 65536
> using 5 cylinder groups of 14514.56MB, 232233 blks, 58112 inodes.
> newfs: can't read old UFS1 superblock: read error from block device: Invalid
> argument
>
> The underlying device is readable/writeable however as dd can read/write to
> it without any errors.
>
> If anyone has any suggestions/thoughts on any of these points it would be
> much appreciated, these machines will be performing backups over 1Gbit LAN
> so more speed than I can currently get would be preferable.
>
> I sent this to geom@ and meant to CC here as that seems to be a pretty quiet
> list so might not get seen there, I forgot the CC so apologies for sending
> separately here. I'll add here a few extra bits sent to geom@ to a response:
>
> Trying newfs with -S option to specify sector size matching -s option to
> geli init:
>
> newfs -S 65536 /dev/da0p1.eli
> increasing block size from 16384 to fragment size (65536)
> /dev/da0p1.eli: 62499.9MB (127999872 sectors) block size 65536, fragment
> size 65536
> using 5 cylinder groups of 14514.56MB, 232233 blks, 58112 inodes.
> newfs: can't read old UFS1 superblock: read error from block device: Invalid
> argument
>
> Diskinfo reports correct sector size for geli layer and 512 byte for
> underlying GPT partition:
> diskinfo -v /dev/da0p1
> /dev/da0p1
> 512 # sectorsize
> 65536000000 # mediasize in bytes (61G)
> 128000000 # mediasize in sectors
> 7967 # Cylinders according to firmware.
> 255 # Heads according to firmware.
> 63 # Sectors according to firmware.
>
> diskinfo -v /dev/da0p1.eli
> /dev/da0p1.eli
> 65536 # sectorsize
> 65535934464 # mediasize in bytes (61G)
> 999999 # mediasize in sectors
> 62 # Cylinders according to firmware.
> 255 # Heads according to firmware.
> 63 # Sectors according to firmware.
>
> Testing on a onetime geli encryption of the underlying raw device to bypass
> the GPT shows very similar poor results:
>
> dd if=/dev/da0.eli of=/dev/null bs=1m count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 29.739186 secs (35259069 bytes/sec)
>
> dd if=/dev/zero of=/dev/da0.eli bs=1m count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 23.501061 secs (44618241 bytes/sec)
>
> For comparison the same test done on the unencrypted raw device:
>
> dd if=/dev/da0 of=/dev/null bs=1m count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 5.802704 secs (180704717 bytes/sec)
>
> dd if=/dev/zero of=/dev/da0 bs=1m count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 4.026869 secs (260394859 bytes/sec)
>
>
> Looking at 'top -S -s1' whilst doing a long read/write using geli shows a
> geli thread for each core but there only ever seems to be one in a running
> state at any given time, the others will be in a state of 'geli:w'. This
> would suggest why performance is identical with only 1 geli thread and with
> 4 geli threads.
>
> Regards,
>
> Dominic Bishop
>
A simple solution is just to add some crypto hardware into the mix to
beef things up. Something like a Soekris VPN 1401 would do the trick.
See hifn(4) and http://www.soekris.com/vpn1401.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: This is a digitally signed message part
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20070731/5f32bf17/attachment.pgp
More information about the freebsd-questions
mailing list