svn commit: r42810 - projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs

Wed Oct 2 21:00:07 UTC 2013

Author: bcr
Date: Wed Oct  2 21:00:07 2013
New Revision: 42810
URL: http://svnweb.freebsd.org/changeset/doc/42810

Log:
  Add a section on deduplication.  This needs some more work and warnings about
  huge memory requirements for the DDT. But the basic instructions are there on
  how to activate it, along with an example that shows the dedup ratio and
  a simulation run with zdb.

Modified:
  projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml

Modified: projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml
==============================================================================

--- projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml	Wed Oct  2 20:02:02 2013	(r42809)
+++ projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml	Wed Oct  2 21:00:07 2013	(r42810)
@@ -518,7 +518,7 @@ errors: No known data errors</screen>
 	how much I/O bandwidth are currently utilized for read and
 	write operations.  By default, all pools in the system will be
 	monitored and displayed.  A pool name can be provided to just
-	monitor one pool. A basic example is provided below:</para>
+	monitor one pool.  A basic example is provided below:</para>
 
 <screen>&prompt.root; <userinput>zpool iostat</userinput>
                capacity     operations    bandwidth
@@ -617,7 +617,7 @@ Filesystem           Size Used Avail Cap
       <para>It is possible to set user-defined properties in ZFS.
 	They become part of the dataset configuration and can be used
 	to provide additional information about the dataset or its
-	contents.  To distnguish these custom properties from the
+	contents.  To distinguish these custom properties from the
 	ones supplied as part of ZFS, a colon (<literal>:</literal>)
 	is used to create a custom namespace for the property.</para>
 
@@ -646,7 +646,7 @@ tank custom:costcenter  1234  local</scr
     </sect2>
 
     <sect2 id="zfs-zfs-quota">
-      <title>Dataset, User and Group Quotes</title>
+      <title>Dataset, User and Group Quotas</title>
 
       <para>To enforce a dataset quota of 10 GB for
 	<filename>storage/home/bob</filename>, use the
@@ -786,6 +786,92 @@ tank custom:costcenter  1234  local</scr
       <title>Deduplication</title>
 
       <para></para>
+
+      <para>To activate deduplication, you simply need to set the
+	following property on the target pool.</para>
+
+      <screen>&prompt.root; <userinput>zfs set dedup=on <replaceable>pool</replaceable></userinput></screen>
+
+      <para>it is important to mention that only new data being
+	written to the pool will be deduplicated.  Data that is
+	already residing on the pool will not be deduplicated by
+	activating this option.  As such, a pool with a freshly
+	activated deduplication property will look something like this
+	example.</para>
+
+      <screen>&prompt.root; <userinput>zpool list</userinput>
+NAME  SIZE ALLOC  FREE CAP DEDUP HEALTH ALTROOT
+pool 2.84G 2.19M 2.83G  0% 1.00x ONLINE -</screen>
+
+      <para>The <literal>DEDUP</literal> column shows the actual rate
+	of deduplication for that pool.  A value of
+	<literal>1.00x</literal> that no data has been deduplicated
+	due to insufficient duplicate data.  In the following example,
+	the ports tree is copied three times into different
+	directories on the deduplicated pool above to provide
+	redundancy.</para>
+
+      <screen>&prompt.root; <userinput>zpool list</userinput>
+for d in dir1 dir2 dir3; do
+for> mkdir $d && cp -R /usr/ports $d &
+for> done</screen>
+
+      <para>Now that redundant data has been created, ZFS detects that
+	and makes sure that the data is not taking up additional
+	space.</para>
+
+      <screen>&prompt.root; <userinput>zpool list</userinput>
+NAME SIZE  ALLOC FREE CAP DEDUP HEALTH ALTROOT
+pool 2.84G 20.9M 2.82G 0% 3.00x ONLINE -</screen>
+
+      <para>The <literal>DEDUP</literal> column does now contain the
+	value <literal>3.00x</literal>. This indicates that ZFS
+	detected the copies of the ports tree data and was able to
+	deduplicate it at a ratio of 1/3.  The space savings that this
+	yields can be enormous, but only when there is enough memory
+	available to keep track of the deduplicated blocks.</para>
+
+      <para>Deduplication is not always beneficial, especially when
+	there is not much redundant data on a ZFS pool.  To see how
+	much space could be saved by deduplication for a given set of
+	data that is already stored in a pool, ZFS can simulate the
+	effects that deduplication would have.  To do that, the
+	following command can be invoked on the pool.</para>
+
+      <screen>&prompt.root; <userinput>zdb -S <replaceable>pool</replaceable></userinput>
+Simulated DDT histogram:
+
+bucket              allocated                       referenced
+______   ______________________________   ______________________________
+refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
+------   ------   -----   -----   -----   ------   -----   -----   -----
+     1    2.58M    289G    264G    264G    2.58M    289G    264G    264G
+     2     206K   12.6G   10.4G   10.4G     430K   26.4G   21.6G   21.6G
+     4    37.6K    692M    276M    276M     170K   3.04G   1.26G   1.26G
+     8    2.18K   45.2M   19.4M   19.4M    20.0K    425M    176M    176M
+    16      174   2.83M   1.20M   1.20M    3.33K   48.4M   20.4M   20.4M
+    32       40   2.17M    222K    222K    1.70K   97.2M   9.91M   9.91M
+    64        9     56K   10.5K   10.5K      865   4.96M    948K    948K
+   128        2   9.50K      2K      2K      419   2.11M    438K    438K
+   256        5   61.5K     12K     12K    1.90K   23.0M   4.47M   4.47M
+    1K        2      1K      1K      1K    2.98K   1.49M   1.49M   1.49M
+ Total    2.82M    303G    275G    275G    3.20M    319G    287G    287G
+
+dedup = 1.05, compress = 1.11, copies = 1.00, dedup * compress / copies = 1.16</screen>
+
+      <para>After <command>zdb -S</command> finished analyzing the
+	pool, it outputs a summary that shows the ratio that would
+	result in activating deduplication.  In this case,
+	<literal>1.16</literal> is a very poor rate that is mostly
+	influenced by compression.  Activating deduplication on this
+	pool would not save any significant amount of space.  Keeping
+	the formula <literal>dedup * compress / copies = deduplication
+	ratio</literal> in mind, a system administrator can plan the
+	storage allocation more towards having multiple copies of data
+	or by having a decent compression rate in order to utilize the
+	space savings that deduplication provides.  As a rule of
+	thumb, compression should be used first before deduplication
+	due to the lower memory requirements.</para>
     </sect2>
   </sect1>