Terrible NFS performance under 9.2-RELEASE?

Rick Macklem rmacklem at uoguelph.ca
Wed Jan 29 00:32:22 UTC 2014


J David wrote:
> On Tue, Jan 28, 2014 at 10:35 AM, Rick Macklem <rmacklem at uoguelph.ca>
> wrote:
> > Since messgaes are sent quickly and then mbufs released, except for
> > the DRC in the server, I think avoiding large allocations for
> > server
> > replies that may be cached is the case to try and avoid.
> > Fortunately
> > the large replies will be for read and readdir and these don't need
> > to be cached by the DRC. As such, a patch that uses 4K clusters in
> > the server for read, readdir and 4K clusters for write requests in
> > the client, should be appropriate, I think?
> 
> m_getm2 appears to consistent produce "right-sized" results.  The
> relevant code is:
> 
>     while (len > 0) {
> 
>         if (len > MCLBYTES)
> 
>             mb = m_getjcl(how, type, (flags & M_PKTHDR),
> 
>                 MJUMPAGESIZE);
> 
>         else if (len >= MINCLSIZE)
> 
>             mb = m_getcl(how, type, (flags & M_PKTHDR));
> 
>         else if (flags & M_PKTHDR)
> 
>             mb = m_gethdr(how, type);
> 
>         else
> 
>             mb = m_get(how, type);
> 
> /* ... */
> 
>     }
> 
> So it allocates the shortest possible chain and uses the best-fit
> cluster for the last (or only) block in the chain.
> 
> It's probably the use of this function in m_uiotombuf or somewhere
> very similar that prevents tools like iperf from encountering this
> same issue.
> 
> Getting this same logic into the NFS code seems like it would be a
> good thing, in terms of reducing code duplication, increasing
> performance, and leveraging a well-tested code path.
> 
For the server generating read replies, I suspect this is the case and
that is what Garrett Wollman's patch does. However, readdir builds up
the reply in small chunks via NFSM_BUILD() and this will require an extra
argument that says "allocate a big cluster". Since it builds the reply in
small chunks, it cannot use m_getm2().

I haven't looked at the client side write yet, so I don't know if m_getm2()
is feasible for it or not.

Hopefully Garrett and/or you will be able to do some testing of it
and report back w.r.t. performance gains, etc. Once we have that,
we can decide if this is an appropriate commit to head.

Since I suspect it will take some time for Garrett to do this, please
try my simple patch in your test environment, mostly to determine if
the fail count goes to 0 (and also count calls to m_collapse() without/with
the patch, since those will impact performance, too).

Thanks in advance for trying the patch, rick
ps: Attached again, just in case you don't already have it.

> It may raise portability concerns, but it does seem likely that other
> OS's to which the NFS code could potentially be ported have similar
> mechanisms these days.  Possibly it would be worthwhile to examine
> whether the NFS code could choose a slightly different point of
> abstraction.  Or, if that's undesirable, maybe asking the
> hypothetical
> person doing such a port to cross that bridge when they come to it is
> not unreasonable, since that would be the person most likely to be
> intimately familiar with the relevant details of both OS's.
> 
As I mentioned before, I am no longer concerned about portability.
The discussion about portability was meant to explain why the code
was written the way it was and, yes, I did note that "portability is
nice" but did not intend to imply that that should limit modifications
to the code that improve it for FreeBSD.

> Also, looking at GAWollman's patch, an mbuf+cluster allocator that
> kicks back a prewired iovec seems really handy.  Is that something
> that would be useful elsewhere in the kernel, or is NFS just kind of
> a
> special case because it's just moving data around, not across weird
> boundaries like device drivers and anything user mode-facing does?
> 
> Thanks!
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 4kmcl.patch
Type: text/x-patch
Size: 1802 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20140128/5237a83a/attachment.bin>


More information about the freebsd-net mailing list