New NFS server stress test hang

Fri Jun 10 21:13:11 UTC 2011

John De wrote:
> ----- Rick Macklem's Original Message -----
> > John De wrote:
> > > ----- Rick Macklem's Original Message -----
> > > > John De wrote:
> > > > > Hi,
> > > > >
> > > > > We've been running some stress tests of the new nfs server.
> > > > > The system is at r222531 (head), 9 clients, two mounts each
> > > > > to the server:
> > > > >
> > > > > mount_nfs -o
> > > > > udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=2
> > > > > ${servera}:/vol/datsrc /c/$servera/vol/datsrc
> > > > > mount_nfs -o
> > > > > udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=0
> > > > > ${servera}:/vol/datgen /c/$servera/vol/datgen
> > > > >
> > > > >
> > > > > The system is still up & responsive, simply no nfs services
> > > > > are working. All (200) threads appear to be active, but not
> > > > > doing anything. The debugger is not compiled into this kernel.
> > > > > We can run any other tracing commands desired. We can also
> > > > > rebuild the kernel with the debugger enabled for any kernel
> > > > > debugging needed.
> > > > >
> > > > > --- long logs deleted ---
> > > >
> > > > How about a:
> > > >  ps axHlww <-- With the "H" we'll see what the nfsd server
> > > >  threads
> > > >  are up to
> > > >  procstat -kka
> > > >
> > > > Oh, and a couple of nfsstats a few seconds apart. It's what the
> > > > counts
> > > > are changing by that might tell us what is going on. (You can
> > > > use
> > > > "-z"
> > > > to zero them out, if you have an nfsstat built from recent
> > > > sources.)
> > > >
> > > > Also, does a new NFS mount attempt against the server do
> > > > anything?
> > > >
> > > > Thanks in advance for help with this, rick
> > >
> > > Hi Rick,
> > >
> > > Here's the output. In general, the nfsd processes appear to be in
> > > either nfsrvd_getcache(35 instances) or nfsrvd_updatecache(164)
> > > sleeping on
> > > "nfssrc". The server numbers don't appear to be moving. A
> > > showmount
> > > from a
> > > client system works, but a mount does not (see below).
> >
> > Please try the attached patch and let me know if it helps. When I
> > looked
> > I found several places where the rc_flag variable was being fiddled
> > without the
> > mutex held. I suspect one of these resulted in the RC_LOCKED flag
> > not
> > getting cleared, so all the threads got stuck waiting on it.
> >
> > The patch is at:
> >   http://people.freebsd.org/~rmacklem/cache.patch
> > in case it gets eaten by the list handler.
> > Thanks for digging into this, rick
> 
> Hi Rick,
> 
> Patch applied. The system has been up and running for about
> 16 hours now and so far it's still handling the load quite nicely.
> 
> last pid: 15853; load averages: 5.36, 4.64, 4.48 up 0+16:08:16
> 08:48:07
> 72 processes: 7 running, 65 sleeping
> CPU: % user, % nice, % system, % interrupt, % idle
> Mem: 22M Active, 3345M Inact, 79G Wired, 9837M Buf, 11G Free
> Swap:
> 
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
> 2049 root 26 52 0 10052K 1712K CPU3 3 97:21 942.24% nfsd
> 
> I'll followup again in 24 hours with another status.
> 
> Any performance related numbers/knobs we can provide that might
> be of interest?
> 
Not really anything I can think of. You obviously have hardware that
runs well or NFS over UDP with 32K rsize/wsize wouldn't work. (I am
not so lucky. My environment drops enough packets that NFS over UDP
is completely unusable.)

It would be interesting to see how your above UDP mounts compare with
using TCP and default (should be 64K) rsize/wsize works, at some point.

And if you really want to try something on the bleeding edge, you could
apply this patch to the server, which enables use of LK_SHARED locked
vnodes for read operations. It has only been lightly tested and I really
doubt it will go in 9.0, but if you could test it, that would be nice.:-)

  http://people.freebsd.org/~rmacklem/lkshared.patch

Thanks for testing this, rick
ps: Hopefully you'll have some insight into how long you need to run with
    the patch before it seems that it fixed your problem? (I know, since it
    is probably an SMP race, you can never be sure.;-)