From nobody Fri May 27 20:12:53 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C41A11B57E41 for ; Fri, 27 May 2022 20:12:56 +0000 (UTC) (envelope-from kempe@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4L8ww765jwz3FFW for ; Fri, 27 May 2022 20:12:55 +0000 (UTC) (envelope-from kempe@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 3B4AD63B5 for ; Fri, 27 May 2022 22:12:54 +0200 (CEST) Received: from shipon.lysator.liu.se (unknown [IPv6:2001:6b0:17:f0a0::83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 39BBB64A0 for ; Fri, 27 May 2022 22:12:54 +0200 (CEST) Date: Fri, 27 May 2022 22:12:53 +0200 From: Andreas Kempe To: freebsd-fs@freebsd.org Subject: FreeBSD 12.3/13.1 NFS client hang Message-ID: List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 4L8ww765jwz3FFW X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=lysator.liu.se; spf=pass (mx1.freebsd.org: domain of kempe@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=kempe@lysator.liu.se X-Spamd-Result: default: False [-4.00 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCVD_COUNT_THREE(0.00)[3]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[130.236.254.3:from]; DMARC_POLICY_ALLOW(-0.50)[lysator.liu.se,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MLMMJ_DEST(0.00)[freebsd-fs]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; RCVD_TLS_LAST(0.00)[] X-ThisMailContainsUnwantedMimeParts: N Hello everyone! I'm having issues with the NFS clients on FreeBSD 12.3 and 13.1 systems hanging when using a CentOS 7 server. Below are procstat kstack $PID invocations showing where the processes have hung. In the nfsv4_sequencelookup it seems hung waiting for nfsess_slots to have an available slot. In the second nfs_lock case, it seems the processes are stuck waiting on vnode locks. These issues seem to appear seemingly at random, but also if operations that open a lot of files or create a lot of file locks are used. An example that can often provoke a hang is performing a recursive grep through a large file hierarchy like the FreeBSD codebase. The NFS code is large and complicated so any advice is appriciated! Cordially, Andreas Kempe Hang provoked when calling "grep -R SOME_STRING ." in the FreeBSD code base. ============================================================================ PID TID COMM TDNAME KSTACK 35585 101045 python3.8 - mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_lock nfsrpc_advlock nfs_advlock VOP_ADVLOCK_APV kern_fcntl kern_fcntl_freebsd amd64_syscall fast_syscall_common 35585 101045 python3.8 - mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_lock nfsrpc_advlock nfs_advlock VOP_ADVLOCK_APV kern_fcntl kern_fcntl_freebsd amd64_syscall fast_syscall_common 44046 101189 vim - mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_accessrpc nfs34_access_otw nfs_access VOP_ACCESS_APV vn_dir_check_exec nfs_lookup VOP_LOOKUP_APV lookup namei kern_statat sys_fstatat amd64_syscall 44046 101764 vim - mi_switch sleepq_catch_signals sleepq_timedwait_sig _cv_timedwait_sig_sbt seltdwait kern_select sys_select amd64_syscall fast_syscall_common 44046 101853 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common 44046 102164 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common 44046 102165 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common 44046 102457 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common 44046 102472 vim - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep __umtx_op_sem2_wait sys__umtx_op amd64_syscall fast_syscall_common 44172 101824 tmux - mi_switch sleepq_timedwait _sleep nfsv4_sequencelookup nfsv4_setsequence nfscl_reqstart nfsrpc_accessrpc nfs34_access_otw nfs_access VOP_ACCESS_APV vn_dir_check_exec nfs_lookup VOP_LOOKUP_APV lookup namei kern_chdir amd64_syscall fast_syscall_common Hang provoked randomly when trying to save an image in kolourpaint. =================================================================== PID TID COMM TDNAME KSTACK 58062 159450 kolourpaint - mi_switch sleeplk lockmgr_slock_hard nfs_lock vop_sigdefer _vn_lock vfs_cache_root vfs_root_sigdefer lookup namei kern_statat sys_fstatat amd64_syscall fast_syscall_common 58062 176390 kolourpaint - mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig seltdwait kern_poll sys_poll amd64_syscall fast_syscall_common 58062 176678 kolourpaint - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep kqueue_kevent kern_kevent_fp kern_kevent_generic sys_kevent amd64_syscall fast_syscall_common PID TID COMM TDNAME KSTACK 34291 101005 fish - mi_switch sleepq_wait sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ncl_nget nfs_root lookup namei vn_open_cred kern_openat amd64_syscall fast_syscall_common 34291 102492 fish - mi_switch sleepq_wait sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ncl_nget nfs_root lookup namei kern_accessat amd64_syscall fast_syscall_common 34291 102493 fish - mi_switch sleepq_wait sleeplk lockmgr_slock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ncl_nget nfs_root lookup namei kern_accessat amd64_syscall fast_syscall_common PID TID COMM TDNAME KSTACK 204 100923 autounmountd - mi_switch sleepq_wait sleeplk lockmgr_xlock_hard VOP_LOCK1_APV nfs_lock VOP_LOCK1_APV _vn_lock vget vfs_hash_get ncl_nget nfs_statfs __vfs_statfs kern_getfsstat sys_getfsstat amd64_syscall fast_syscall_common