Re: FreeBSD 12.3/13.1 NFS client hang

Reply: Kurt Jaeger : "Re: FreeBSD 12.3/13.1 NFS client hang"
In reply to: Rick Macklem : "Re: FreeBSD 12.3/13.1 NFS client hang"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Sat, 28 May 2022 21:27:45 UTC
Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Kurt Jaeger <pi@freebsd.org> wrote:
> > > > I'm having issues with the NFS clients on FreeBSD 12.3 and 13.1
> >
> > I have it with an 13.0p7 client against an 13.1 server with
> > a hanging soft-mount (I tried unmount to change it to a hard mount).
> >
> > 61585  93- D+        0:00.00 umount /office/serv
> > 61635 133  D         0:00.00 umount -f /office/serv
> > 7784 138  D         0:00.00 umount -N /office/serv
> The first umount must be "-N". Once you've hung a non "-N" umount,
> rebooting is the only option.
> (I have thought of doing a "umount -N -A" (for all NFS mounts), which
> would allow it to kill off all NFS activity without even finding the pathname
> for the mountpoint, but I have not done so.)
I take this back. I just did a fairly trivial test of this and it worked.
Looking at the "ps" output, I don't think your case is a "NFS protocol hang".
When I look at the "ps" output, there are no threads waiting on NFS RPCs to complete.
(umount -N kills off outstanding RPCs, so the VFS/VOP ops can complete with error, which should
 dismount a hang caused by an unresponsive NFS server or similar.)

The only threads sleeping in the nfs code are waiting for an NFS vnode lock.
I suspect that some process/thread is hung for something non-NFS while holding a lock
on a NFS vnode. "umount -N" won't know how to unhang this process/thread.
Just a hunch, but I'd suspect one of the threads sleeping on "vmopar", although I'm
not a vm guy.
What I don't know how to do is figure out what thread(s) are holding vnode locks?

This also implies that switching from soft->hard won't fix the problem.

It would be nice if "umount -N" could handle this case. I'll look at the VFS code and
maybe talk to kib@ to see if there is a way to mark all NFS vnodes "dead" so that
vn_lock() will either return an error or a locked bit VI_DOOMED vnode (if LK_RETRY is
specified).

In summary, I don't think your hang is anything like Andreas's, rick

> and procstat:
>
> # procstat -kk 7784
>  PID    TID COMM                TDNAME              KSTACK
> 7784 107226 umount              -                   mi_switch+0xc1 sleeplk+0xec lockmgr_xlock_hard+0x345 _vn_lock+0x48 vget_finish+0x21 cache_lookup+0x299 vfs_cache_lookup+0x7b lookup+0x68c namei+0x487 kern_unmount+0x164 amd64_syscall+0x10c fast_syscall_common+0xf8
> # procstat -kk 61635
>  PID    TID COMM                TDNAME              KSTACK
> 61635 775458 umount              -                   mi_switch+0xc1 sleeplk+0xec lockmgr_slock_hard+0x382 _vn_lock+0x48 vget_finish+0x21 cache_lookup+0x299 vfs_cache_lookup+0x7b lookup+0x68c namei+0x487 sys_statfs+0xc3 amd64_syscall+0x10c fast_syscall_common+0xf8
> # procstat -kk 61585
>  PID    TID COMM                TDNAME              KSTACK
> 61585 516164 umount              -                   mi_switch+0xc1 sleeplk+0xec lockmgr_xlock_hard+0x345 nfs_lock+0x2c vop_sigdefer+0x2b _vn_lock+0x48 vflush+0x151 nfs_unmount+0xc3 vfs_unmount_sigdefer+0x2e dounmount+0x437 kern_unmount+0x332 amd64_syscall+0x10c fast_syscall_common+0xf8
These just show that they are waiting for NFS vnodes. In the "ps" there are
threads waiting on zfs vnodes as well.

> ps-axHl can be found at
>
> https://people.freebsd.org/~pi/logs/ps-axHl.txt
I suspect your problem might be related to wired pages. Note that
several threads are sleeping on "vmopar". I'm no vm guy, but I
think that might mean too many pages have become wired.

rick

> > systems hanging when using a CentOS 7 server.
> First, make sure you are using hard mounts. "soft" or "intr" mounts won't
> work and will mess up the session sooner or later. (A messed up session could
> result in no free slots on the session and that will wedge threads in
> nfsv4_sequencelookup() as you describe.
> (This is briefly described in the BUGS section of "man mount_nfs".)
>
> Do a:
> # nfsstat -m
> on the clients and look for "hard".

No output at all for that 8-(

--
pi@FreeBSD.org         +49 171 3101372                  Now what ?