tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS
writes choking read IO
Dan Nelson
dnelson at allantgroup.com
Wed Mar 24 17:55:49 UTC 2010
In the last episode (Mar 24), Bob Friesenhahn said:
> On Wed, 24 Mar 2010, Dan Naumov wrote:
> > Has anyone done any extensive testing of the effects of tuning
> > vfs.zfs.vdev.max_pending on this issue? Is there some universally
> > recommended value beyond the default 35? Anything else I should be
> > looking at?
>
> The vdev.max_pending value is primarily used to tune for SAN/HW-RAID LUNs
> and is used to dial down LUN service time (svc_t) values by limiting the
> number of pending requests. It is not terribly useful for decreasing
> stalls due to zfs writes. In order to reduce the impact of zfs writes,
> you want to limit the maximum size of a zfs transaction group (TXG). I
> don't know what the FreeBSD tunable is for this, but under Solaris it is
> zfs:zfs_write_limit_override.
There isn't a sysctl for it by default, but the following patch will enable
a vfs.zfs.write_limit_override sysctl:
Index: dsl_pool.c
===================================================================
RCS file: /home/ncvs/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c,v
retrieving revision 1.4.2.1
diff -u -p -r1.4.2.1 dsl_pool.c
--- dsl_pool.c 17 Aug 2009 09:55:58 -0000 1.4.2.1
+++ dsl_pool.c 11 Mar 2010 08:34:27 -0000
@@ -47,6 +47,11 @@ uint64_t zfs_write_limit_inflated = 0;
uint64_t zfs_write_limit_override = 0;
extern uint64_t zfs_write_limit_min;
+SYSCTL_DECL(_vfs_zfs);
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RW,
+ &zfs_write_limit_override, 0,
+ "Force a txg if dirty buffers exceed this value (bytes)");
+
kmutex_t zfs_write_limit_lock;
static pgcnt_t old_physmem = 0;
> On a large-memory system, a properly working zfs should not saturate
> the write channel for more than 5 seconds. Zfs tries to learn the
> write bandwidth so that it can tune the TXG size up to 5 seconds (max)
> worth of writes. If you have both large memory and fast storage,
> quite a huge amount of data can be written in 5 seconds. On my
> Solaris system, I found that zfs was quite accurate with its rate
> estimation, but it resulted in four gigabytes of data being written
> per TXG.
I had similar problems on a 32GB Solaris server at work. Note that with
compression enabled, the entire system pauses while it compresses the
outgoing block of data. It's just a fraction of a second, but long enough
for end-users to complain about bad performance in X sessions. I had to
throttle back to a 256MB write limit size to make the stuttering go away
completely. It didn't affect write throughput much at all.
--
Dan Nelson
dnelson at allantgroup.com
More information about the freebsd-questions
mailing list