Re: 13-stable NFS server hang

From: Ronald Klop <ronald-lists_at_klop.ws>
Date: Fri, 01 Mar 2024 08:00:20 UTC
Interesting read. 

 Would it be possible to separate locking for admin actions like a client mounting an fs from traffic flowing for file operations?

Like ongoing file operations could have a read only view/copy of the mount table. Only new operations will have to wait.
But the mount never needs to wait for ongoing operations before locking the structure. 

Just a thought in the morning

Regards,
Ronald.

Van: Rick Macklem <rick.macklem@gmail.com>
Datum: 1 maart 2024 00:31
Aan: Garrett Wollman <wollman@bimajority.org>
CC: stable@freebsd.org, rmacklem@freebsd.org
Onderwerp: Re: 13-stable NFS server hang

> 
> 
> On Wed, Feb 28, 2024 at 4:04PM Rick Macklem  wrote:
> >
> > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman  wrote:
> > >
> > > Hi, all,
> > >
> > > We've had some complaints of NFS hanging at unpredictable intervals.
> > > Our NFS servers are running a 13-stable from last December, and
> > > tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> > > able to clearly see that there were periods when NFS activity would
> > > drop *instantly* from 30,000 ops/s to flat zero, which would last
> > > for about 25 seconds before resuming exactly as it was before.
> > >
> > > I wrote a little awk script to watch for this happening and run
> > > `procstat -k` on the nfsd process, and I saw that all but two of the
> > > service threads were idle.  The three nfsd threads that had non-idle
> > > kstacks were:
> > >
> > >   PID    TID COMM                TDNAME              KSTACK
> > >   997 108481 nfsd                nfsd: master        mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common
> > >   997 960918 nfsd                nfsd: service       mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> > >   997 962232 nfsd                nfsd: service       mi_switch _cv_wait txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> > >
> > > I'm suspicious of two things: first, the copy_file_range RPC; second,
> > > the "master" nfsd thread is actually servicing an RPC which requires
> > > obtaining a lock.  The "master" getting stuck while performing client
> > > RPCs is, I believe, the reason NFS service grinds to a halt when a
> > > client tries to write into a near-full filesystem, so this problem
> > > would be more evidence that the dispatching function should not be
> > > mixed with actual operations.  I don't know what the clients are
> > > doing, but is it possible that nfsrvd_copy_file_range is holding a
> > > lock that is needed by one or both of the other two threads?
> > >
> > > Near-term I could change nfsrvd_copy_file_range to just
> > > unconditionally return NFSERR_NOTSUP and force the clients to fall
> > > back, but I figured I would ask if anyone else has seen this.
> > I have attached a little patch that should limit the server's Copy size
> > to vfs.nfsd.maxcopyrange (default of 10Mbytes).
> > Hopefully this makes sure that the Copy does not take too long.
> >
> > You could try this instead of disabling Copy. It would be nice to know if
> > this is suffciient? (If not, I'll probably add a sysctl to disable Copy.)
> I did a quick test without/with this patch,where I copied a 1Gbyte file.
> 
> Without this patch, the Copy RPCs mostly replied in just under 1sec
> (which is what the flag requests), but took over 4sec for one of the Copy
> operations. This implies that one Read/Write of 1Mbyte on the server
> took over 3 seconds.
> I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbytes
> each and it was one of these 100Mbyte Copy operations that took over 4sec.
> 
> With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes
> each and they took a consistent 0.25-0.3sec to reply. (This is a test of a local
> mount on an old laptop, so nowhere near a server hardware config.)
> 
> So, the patch might be sufficient?
> 
> It would be nice to avoid disabling Copy, since it avoids reading the data
> into the client and then writing it back to the server.
> 
> I will probably commit both patches (10Mbyte clip of Copy size and
> disabling Copy) to main soon, since I cannot say if clipping the size
> of the Copy will always be sufficient.
> 
> Pleas let us know how trying these patches goes, rick
> 
> >
> > rick
> >
> > >
> > > -GAWollman
> > >
> > >
> 
> 
> 
> 
>