New NFS server stress test hang
Rick Macklem
rmacklem at uoguelph.ca
Mon Jun 20 12:34:37 UTC 2011
John De wrote:
> ----- Rick Macklem's Original Message -----
> > John De wrote:
> > > ----- Rick Macklem's Original Message -----
> > > > John De wrote:
> > > > > Hi,
> > > > >
> > > > > We've been running some stress tests of the new nfs server.
> > > > > The system is at r222531 (head), 9 clients, two mounts each
> > > > > to the server:
> > > > >
> > > > > mount_nfs -o
> > > > > udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=2
> > > > > ${servera}:/vol/datsrc /c/$servera/vol/datsrc
> > > > > mount_nfs -o
> > > > > udp,nfsv3,rsize=32768,wsize=32768,noatime,nolockd,acregmin=1,acregmax=2,acdirmin=1,acdirmax=2,negnametimeo=0
> > > > > ${servera}:/vol/datgen /c/$servera/vol/datgen
> > > > >
> > > > >
> > > > > The system is still up & responsive, simply no nfs services
> > > > > are working. All (200) threads appear to be active, but not
> > > > > doing anything. The debugger is not compiled into this kernel.
> > > > > We can run any other tracing commands desired. We can also
> > > > > rebuild the kernel with the debugger enabled for any kernel
> > > > > debugging needed.
> > > > >
> > > > > --- long logs deleted ---
> > > >
> > > > How about a:
> > > > ps axHlww <-- With the "H" we'll see what the nfsd server
> > > > threads
> > > > are up to
> > > > procstat -kka
> > > >
> > > > Oh, and a couple of nfsstats a few seconds apart. It's what the
> > > > counts
> > > > are changing by that might tell us what is going on. (You can
> > > > use
> > > > "-z"
> > > > to zero them out, if you have an nfsstat built from recent
> > > > sources.)
> > > >
> > > > Also, does a new NFS mount attempt against the server do
> > > > anything?
> > > >
> > > > Thanks in advance for help with this, rick
> > >
> > > Hi Rick,
> > >
> > > Here's the output. In general, the nfsd processes appear to be in
> > > either nfsrvd_getcache(35 instances) or nfsrvd_updatecache(164)
> > > sleeping on
> > > "nfssrc". The server numbers don't appear to be moving. A
> > > showmount
> > > from a
> > > client system works, but a mount does not (see below).
> >
> > Please try the attached patch and let me know if it helps. When I
> > looked
> > I found several places where the rc_flag variable was being fiddled
> > without the
> > mutex held. I suspect one of these resulted in the RC_LOCKED flag
> > not
> > getting cleared, so all the threads got stuck waiting on it.
> >
> > The patch is at:
> > http://people.freebsd.org/~rmacklem/cache.patch
> > in case it gets eaten by the list handler.
> > Thanks for digging into this, rick
>
> Hi Rick,
>
> Patch applied. The system has been up and running for about
> 16 hours now and so far it's still handling the load quite nicely.
>
> last pid: 15853; load averages: 5.36, 4.64, 4.48 up 0+16:08:16
> 08:48:07
> 72 processes: 7 running, 65 sleeping
> CPU: % user, % nice, % system, % interrupt, % idle
> Mem: 22M Active, 3345M Inact, 79G Wired, 9837M Buf, 11G Free
> Swap:
>
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
> 2049 root 26 52 0 10052K 1712K CPU3 3 97:21 942.24% nfsd
>
> I'll followup again in 24 hours with another status.
>
> Any performance related numbers/knobs we can provide that might
> be of interest?
>
> Thanks Rick.
>
> -John
Just fyi, the patch has been committed to head and unless there
are problems, will be in stable/8 in a couple of weeks.
Thanks for helping with this.
Please let me know if you have more problems, rick
More information about the freebsd-fs
mailing list