NFS reads vs. writes
Rick Macklem
rmacklem at uoguelph.ca
Mon Jan 4 01:37:37 UTC 2016
Mikhail T. wrote:
> On 03.01.2016 02:16, Karli Sjöberg wrote:
> >
> > The difference between "mount" and "mount -o async" should tell you if
> > you'd benefit from a separate log device in the pool.
> >
> This is not a ZFS problem. The same filesystem is being read in both
> cases. The same data is being read from and written to the same
> filesystems. For some reason, it is much faster to read via NFS than to
> write to it, however.
>
This issue isn't new. It showed up when Sun introduced NFS in 1985.
NFSv3 did change things a little, by allowing UNSTABLE writes.
Here's what an NFSv3 or NFSv4 client does when writing:
- Issues some # of UNSTABLE writes. The server need only have these is server
RAM before replying NFS_OK.
- Then the client does a Commit. At this point the NFS server is required to
store all the data written in the above writes and related metadata on stable
storage before replying NFS_OK.
--> This is where the "sync" vs "async" is a big issue. If you use "sync=disabled"
(I'm not a ZFS guy, but I think that is what the ZFS option looks likes) you
*break* the NFS protocol (ie. violate the RFC) and put your data at some risk,
but you will typically get better (often much better) write performance.
OR
You put a ZIL on a dedicated device with fast write performance, so the data
can go there to satisfy the stable storage requirement. (I know nothing
about them, but SSDs have dramatically different write performance, so an SSD
to be used for a ZIL must be carefully selected to ensure good write performance.)
How many writes are in "some #" is up to the client. For FreeBSD clients, the "wcommitsize"
mount option can be used to adjust this. Recently the default tuning of this changed
significantly, but you didn't mention how recent your system(s) are, so manual tuning of
it may be useful. (See "man mount_nfs" for more on this.)
Also, the NFS server was recently tweaked so that it could handle 128K rsize/wsize,
but the FreeBSD client is limited to MAXBSIZE and this has not been increased
beyond 64K. To do so, you have to change the value of this in the kernel sources
and rebuild your kernel. (The problem is that increasing MAXBSIZE makes the kernel
use more KVM for the buffer cache and if a system isn't doing significant client
side NFS, this is wasted.)
Someday, I should see if MAXBSIZE can be made a TUNABLE, but I haven't done that.
--> As such, unless you use a Linux NFS client, the reads/writes will be 64K, whereas
128K would work better for ZFS.
Some NAS hardware vendors solve this problem by using non-volatile RAM, but that
isn't available in generic hardware.
> And finally, just to put the matter to rest, both ZFS-pools already have
> a separate zil-device (on an SSD).
>
If this SSD is dedicated to the ZIL and is one known to have good write performance,
it should help, but in your case the SSD seems to be the bottleneck.
rick
> -mi
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
More information about the freebsd-fs
mailing list