The minimum amount of memory needed to use ZFS.
Alan Amesbury
amesbury at oitsec.umn.edu
Wed Dec 23 21:57:02 UTC 2015
On Dec 23, 2015, at 11:53 , Bob Bishop <rb at gid.co.uk> wrote:
[snip]
> Deduplication seems like a very bad idea unless you have both a lot of duplicated data and a serious shortage of disk. It needs a lot of RAM, increasing over time. Depending on the hardware and the use case, compression (which effectively only costs CPU) might be a better option.
Agreed: Deduplication isn't something you want to enable until you're sure you have a workload that's suitable for it. Memory usage increases on Freebsd to an estimated 2-5GB per terabyte of zpool[1]. Oracle has published[2] some information on deduplication in ZFS, too, which parallels information in the FreeBSD wiki, namely the use of 'zdb' to analyze your data to determine if deduplication is even worthwhile. Note this can take a while to run and, at least for me, had issues running on at least one of my hosts.
Output is pretty straightforward. For example:
# zdb -S pool
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 4.94M 578G 577G 579G 4.94M 578G 577G 579G
2 416K 50.5G 50.5G 50.5G 922K 112G 112G 112G
4 39.6K 4.89G 4.89G 4.89G 175K 21.6G 21.6G 21.6G
8 3.06K 382M 381M 382M 31.6K 3.85G 3.84G 3.85G
16 306 34.4M 33.3M 33.4M 5.81K 665M 639M 641M
32 62 6.13M 4.99M 5.04M 2.77K 281M 230M 232M
64 41 4.88M 4.88M 4.88M 3.56K 432M 432M 433M
128 25 3.12M 3.12M 3.12M 4.37K 560M 560M 560M
256 71 8.88M 8.88M 8.88M 20.4K 2.56G 2.56G 2.56G
512 2 256K 256K 256K 1.27K 163M 163M 163M
2K 2 256K 256K 256K 4.19K 536M 536M 536M
128K 1 128K 128K 128K 148K 18.4G 18.4G 18.4G
Total 5.39M 634G 633G 634G 6.23M 739G 739G 740G
dedup = 1.17, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.17
For this host there's some evidence that deduplication might buy me a small amount of additional space, but I'd rather allocate RAM to the ARC for performance instead of using it for what looks like a small reduction in space usage. For my workloads, I tend to get a much bigger boost from using compression, as modern CPUs can typically compress pretty close to the speed of rotational media. (SSDs would be a different story.) Example 'zdb -S' output from a host using compression:
# zdb -S pool
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 8.25M 1008G 80.6G 80.6G 8.25M 1008G 80.6G 80.6G
2 697 76.4M 21.3M 21.3M 1.46K 160M 44.8M 44.8M
4 1.05K 10.2M 3.34M 3.34M 5.15K 48.6M 15.9M 15.9M
8 65 1.09M 318K 318K 649 10.8M 3.06M 3.06M
16 23 904K 300K 300K 558 20.1M 6.55M 6.55M
32 18 1.78M 681K 681K 770 74.2M 27.7M 27.7M
64 29 3.27M 1.23M 1.23M 2.61K 305M 115M 115M
128 15 1.41M 536K 536K 2.38K 209M 77.3M 77.3M
Total 8.25M 1008G 80.6G 80.6G 8.26M 1009G 80.9G 80.9G
dedup = 1.00, compress = 12.47, copies = 1.00, dedup * compress / copies = 12.51
The data, primarily textual log files of some kind, compresses pretty well.
--
Alan Amesbury
University Information Security
http://umn.edu/lookup/amesbury
[1] - https://wiki.freebsd.org/ZFSTuningGuide#Deduplication
[2] - http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-size-zfs-dedup-1354231.html
More information about the freebsd-hackers
mailing list