Re: FreeBSD 12.3/13.1 NFS client hang
- Reply: Andreas Kempe : "Re: FreeBSD 12.3/13.1 NFS client hang"
- Reply: Kurt Jaeger : "Re: FreeBSD 12.3/13.1 NFS client hang"
- In reply to: Andreas Kempe : "FreeBSD 12.3/13.1 NFS client hang"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 27 May 2022 20:59:57 UTC
Andreas Kempe <kempe@lysator.liu.se> wrote: > Hello everyone! > > I'm having issues with the NFS clients on FreeBSD 12.3 and 13.1 > systems hanging when using a CentOS 7 server. First, make sure you are using hard mounts. "soft" or "intr" mounts won't work and will mess up the session sooner or later. (A messed up session could result in no free slots on the session and that will wedge threads in nfsv4_sequencelookup() as you describe. (This is briefly described in the BUGS section of "man mount_nfs".) Do a: # nfsstat -m on the clients and look for "hard". Next, is there anything logged on the console for the 13.1 client(s)? (13.1 has some diagnostics for things like a server replying with the wrong session slot#.) Also, maybe I'm old fashioned, but I find "ps axHl" useful, since it shows where all the processes are sleeping. And "procstat -kk" covers all of the locks. > Below are procstat kstack $PID invocations showing where the processes > have hung. In the nfsv4_sequencelookup it seems hung waiting for > nfsess_slots to have an available slot. In the second nfs_lock case, > it seems the processes are stuck waiting on vnode locks. > > These issues seem to appear seemingly at random, but also if > operations that open a lot of files or create a lot of file locks are > used. An example that can often provoke a hang is performing a > recursive grep through a large file hierarchy like the FreeBSD > codebase. > > The NFS code is large and complicated so any advice is appriciated! Yea. I'm the author and I don't know exactly what it all does;-)\ > Cordially, > Andreas Kempe > > Hang provoked when calling "grep -R SOME_STRING ." in the FreeBSD code base. ============================================================================ > > PID TID COMM TDNAME KSTACK > 35585 101045 python3.8 - mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_lock nfsrpc_advlock nfs_advlock VOP_ADVLOCK_APV kern_fcntl kern_fcntl_freebsd amd64_syscall fast_syscall_common As you noted, this is waiting for a session slot to become available. Normal, so long as other RPCs are "in progress" that will release slots when replies are received. (If your mount was not hard, sooner or later, the client will give up waiting for the reply and the session slot will not be released. Once all slots are "not released" you are hung. > 35585 101045 python3.8 - mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_lock nfsrpc_advlock nfs_advlock VOP_ADVLOCK_APV kern_fcntl kern_fcntl_freebsd amd64_syscall fast_syscall_common Same as above. > 44046 101189 vim - mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_accessrpc nfs34_access_otw nfs_access VOP_ACCESS_APV vn_dir_check_exec nfs_lookup VOP_LOOKUP_APV lookup namei kern_statat sys_fstatat amd64_syscall Again. > 44046 101764 vim - mi_switch sleepq_catch_signals sleepq_timedwait_sig _cv_timedwait_sig_sbt seltdwait kern_select sys_select amd64_syscall fast_syscall_common > 44046 101853 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common > 44046 102164 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common > 44046 102165 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common > 44046 102457 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common > 44046 102472 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common I know nothing about umtx, so can't help here. > 44172 101824 tmux - mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_accessrpc nfs34_access_otw nfs_access VOP_ACCESS_APV vn_dir_check_exec nfs_lookup VOP_LOOKUP_APV lookup namei kern_chdir amd64_syscall fast_syscall_common Another one waiting for a session slot. > Hang provoked randomly when trying to save an image in kolourpaint. > =================================================================== > > PID TID COMM TDNAME KSTACK > 58062 159450 kolourpaint - mi_switch sleeplk lockmgr_slock_hard nfs_lock vop_sigdefer _vn_lock vfs_cache_root vfs_root_sigdefer lookup namei kern_statat sys_fstatat amd64_syscall fast_syscall_common Yep, waiting for a vnode lock, I think. > 58062 176390 kolourpaint - mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig seltdwait kern_poll sys_poll amd64_syscall fast_syscall_common > 58062 176678 kolourpaint - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep kqueue_kevent kern_kevent_fp kern_kevent_generic sys_kevent amd64_syscall fast_syscall_common > > PID TID COMM TDNAME KSTACK > 34291 101005 fish - mi_switch sleepq_wait sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ncl_nget nfs_root lookup namei vn_open_cred kern_openat amd64_syscall fast_syscall_common Also waiting for a vnode. > 34291 102492 fish - mi_switch sleepq_wait sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ncl_nget nfs_root lookup namei kern_accessat amd64_syscall fast_syscall_common Again. > 34291 102493 fish - mi_switch sleepq_wait sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ncl_nget nfs_root lookup namei kern_accessat amd64_syscall fast_syscall_common And again. > PID TID COMM TDNAME KSTACK > 204 100923 autounmountd - mi_switch sleepq_wait sleeplk lockmgr_xlock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ncl_nget nfs_statfs __vfs_statfs kern_getfsstat sys_getfsstat amd64_syscall fast_syscall_common And again. Not very useful unless you have all the processes and their locks to try and figure out what is holding the vnode locks. rick