Patch proposal: Speeding up ZFS writes

Martin Matuska mm at FreeBSD.org
Sat Aug 21 14:11:01 UTC 2010


Hi Pawel, Xin and members of zfs-devel@ !

Many of our users today are complaining about slow ZFS writes.
One of the causes for these writes is the allocation method for new
blocks used [1]. Solaris 10 and OpenSolaris up to November 2009 used the
following scenario:

- pool has more than 30% free space: use first fit method [2]
- pool has less than 70% free space: use best fit method [2]

This causes a major slowdown and, let's say, unpleasant "fragmentation"
of the writes if we go below 30% of free space.

OpenSolaris has changed this and the Oracle Storage Appliances also
included the new code in Q1/2010 [2].

The source [2] states, that with this change they archieved a speedup
of: "50% Improved OLTP Performance, 70% Reduced Variability, 200%
Improvement on MS Exchange"

So what can we do to improve this situation?

a) on the very short term, as a workaround soulution [3], we could just
make metaslab_df_free_pct from metaslab.c tunable so users can test
lower settings (I personally tend more to b))

b) on the mid-term, I suggest this patch for head with MFC to stable/8
after some reasonable time (1-2 months):
http://people.freebsd.org/~mm/patches/zfs/zfs_metaslab.patch

c) on the long term, updating to current ZFS code (v28) that integrates
this patch

The patch in b) includes the following OpenSolaris onnv revisions:
10921 (very small part, metaslab.c)
11146 (main patch, applies almost cleanly)
11728 (fix for zdb.c)
12047 (improvement to metaslab.c)

OpenSolaris Bug IDs:
6826241 Sync write IOPS drops dramatically during TXG sync
6869229 zfs should switch to shiny new metaslabs more frequently
6917066 zfs block picking can be improved
6918420 zdb -m has issues printing metaslab statistics

References:
[1] http://blogs.sun.com/bonwick/entry/zfs_block_allocation
[2] http://sun.systemnews.com/articles/147/2/OpenStorage/22963
[3]
http://blogs.everycity.co.uk/alasdair/2010/07/zfs-runs-really-slowly-when-free-disk-usage-goes-above-80/


More information about the zfs-devel mailing list