FreeBSD ZFS file server with SSD HDD

markham breitbach markham_breitbach at
Wed Oct 11 18:08:07 UTC 2017

I ran into some problems of disks choking on heavy IO under VMware.  It
turned out to be an issue with firmware on the SSDs and backplane in a
Dell server.
It's probably worth making sure those are all up to date.


On 2017-10-11 11:30 AM, David Christensen wrote:
> On 10/11/17 06:05, Kate Dawson wrote:
>> Currently running a FreeBSD NFS server with a zpool comprising
>> 12 x 1TB hard disk drives are arranged as pairs of mirrors in a strip
>> set ( RAID 10 )
> That should do 6+ Gb/s.
> bonnie++ should be able to measure that.  (It's been a while, but I
> seem to recall that bonnie++ expects raw drives and nukes your data. 
> So, it could take some effort to use it.)
>> An additional 2x 960GB SSD added. These two SSD are partitioned with a
>> small partition begin used for a ZIL log, and larger partion arranged
>> for
>> L2ARC cache.
> Assuming the ZIL is mirrored, that should do 5+ Gb/s.
> Assuming the L2ARC is striped, that should do 10+ Gb/s.
> I dont' know how to test ZIL and L2ARC in isolation, but dbench should
> be able to test what ZFS exposes, both locally and over NFS:
>> Additionally the host has 64GB RAM and 16 CPU cores (AMD Opteron 2Ghz)
> That should do 20+ Gb/s.
> Memtest86+ will be to measure:
>> A dataset from the pool is exported via NFS to a number of Debian
>> Gnu/Linux hosts running a xen hypervisor. These run several disk image
>> based virtual machines
>> In general use, the FreeBSD NFS host sees very little read IO, which
>> is to expected
>> as the RAM cache  and L2ARC are designed to minimise the amount of
>> read load
>> on the disks.
>> However we're starting to see high load ( mostly IO WAIT ) on the Linux
>> virtualisation hosts, and virtual machines - with kernel timeouts
>> occurring resulting in crashes and instability.
>> I believe this may be due to the limited number of random write IOPS
>> available
>> on the zpool NFS export.
>> I can get sequential writes and reads to and from the NFS server at
>> speeds that approach the maximum the network provides ( currently 1Gb/s
>> + Jumbo Frames, and I could increase this by bonding multiple
>> interfaces together. )
>> However day to day usage does not show network utilisation anywhere near
>> this maximum.
>> If I look at the output of `zpool iostat -v tank 1 ` I see that every
>> five seconds or so, the numner of write operation go to > 2k
>> I think this shows that the I'm hitting the limit that the spinning disk
>> can provide in this workload.
>> As a cost effective way to improve this ( rather than replacing the
>> whole chassis ) I was considering replacing the 1TB HDD with 1TB SSD,
>> for the improved IOPS.
>> I wonder if there were any opinions within the community here, on
>> 1. What metrics can I gather to confirm the disk write IO as bottleneck?
>> 2. If the proposed solution will have the required effect?  That is an
>> decrease in the IOWAIT on the GNU/Linux virtualization hosts.
> I infer your network to be:
> - 1 host running FreeBSD (freebsd-version? uname -a?) and an NFS
> server (version?).
> - N (how many?) Debian GNU/Linux hosts (/etc/debian-version?  uname
> -a?), each running a Xen hypervisor (version?) and an NFS client.
> - The VM's are configured to see their drives as local devices (e.g.
> the VM's are not running NFS clients connected to the FreeBSD NFS
> server).
> - Gigabit switch (make? model?).
> - 1 Gigabit connection between switch and each host.
> As you have correctly stated, you need visibility on the relevant
> performance metrics to make informed decisions.  In addition to the
> above tools:
> - For networking, I'd try netstat:
> - For drive I/O, I use nmon on Debian:
> - I believe iostat is available on both:
> - For CPU's, RAM, and swap, I use top.
> - You seem to have found at least one ZFS tool.
> As others have stated, you will want to ensure that all the pieces are
> reasonably in tune -- VM, NFS client, Xen, Debian networking, switch,
> FreeBSD networking, NFS server, ZFS, etc..  I'd start by looking for
> errors and/or warnings in the usual places (dmesg, /var/log, etc.).  I
> typically leave the settings at the installer defaults, unless I have
> some compelling reason to make a change (at least one reader made a
> suggestion).  Be sure to keep good notes if you're going to muck with
> the settings.
> As for 'zpool iostat -v tank 1', I suspect ZFS is telling you that it
> is flushing writes to the HDD's every five seconds.  If flushes always
> complete before the next scheduled flush, replacing the HDD's with
> SSD's probably will not help with the VM IO WAIT and kernel timeout
> problems. But, if the flushes are overrunning each other during peak
> usage, you may have found the bottleneck.
> That said, I suspect that the root cause of the VM IO WAIT and kernel
> timeout problems is that the virtual machines need a low latency
> connection to their system drives, temporary file systems, and/or swap
> devices, and they aren't getting it.  I would not bet on NFS to
> provide this, even with SSD's instead of HDD's.  I would bet on local
> resources.  I suggest:
> 1.  Put 2 mirrored SSD's in each Xen server.
> 2.  Put VM system drives on the local SSD mirror.
> 3.  Put VM /tmp file systems on the local SSD mirror, or on RAM:
> 4.  Put VM swap devices on the local SSD mirror, or on RAM:
> 5.  Put VM data drives on NFS.
> I am unsure if it is better to do the "on RAM" and "on NFS" ideas at
> the Xen level or within each VM.  Performance is one consideration. 
> Others considerations are security and accountability -- e.g. do
> customers have root on the VM's?
> To improve NFS performance:
> 1.  Enlarging the pipe between the NFS server and the switch --
> bonding (your idea), upgrade to 10 Gb/s, etc..
> 2.  Enlarge the pipes between the Xen hosts and the switch.
> 3.  Add NIC's to the NFS server, add switches, and divide up the Xen
> hosts across the switches.
> 4.  Add NIC's to the NFS server, one per Xen host, and make direct
> connections between the NFS server and each Xen host.
> Please let us know how it goes.  :-)
> David
> _______________________________________________
> freebsd-questions at mailing list
> To unsubscribe, send any mail to
> "freebsd-questions-unsubscribe at"

More information about the freebsd-questions mailing list