From nobody Sun Aug 27 22:45:16 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RYpg12ZBHz4rkf3; Sun, 27 Aug 2023 22:45:17 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RYpg11LmXz3QDW; Sun, 27 Aug 2023 22:45:17 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1693176317; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=WxzW0dXz4G2PJIhCRLyeN8X22UCtfo2BVwY2VcGmKwk=; b=v4Ri4+Po5235H8fWYfpAESdAStaFItgfBUJmEVWd/hmqKJyepN/cUz2HrQtm4rkuCk/DOC 8eYUxJhGfGsB/eVxvoZFIuDKrbHNJVfQli3rmXgGbLepc+4tYJ6jHxwM9y6f7IecG+feH0 9J/Yxpmr69NbEs4LXxARESOJp52r3lCcr1ZAFBsQGwX0SSh9J5aGNK0ovMMWYJuq5HyNY0 WUh5/SBkk0Je94d8T0u+9gb41UJet3Pi06yP7xYIuLkuGu/JizRPGXnYfGICUgqAer3Y+E 1sdDBnpuKF1B9yoOLdCOkvsggGubF/fWf2VLE8DVpfNaTw9L4Td7D7EgFQm81Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1693176317; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=WxzW0dXz4G2PJIhCRLyeN8X22UCtfo2BVwY2VcGmKwk=; b=q02D2JoWelSGp+kRjtUepWAjjVkfwkU0GONEEZtAGhsa2Xf+Ni+XnMz7EbHdH+zRPMZe50 XcZLSnQ6IFYuQdDZUU8MGKlEPkv/tLiEYAp/pgK7TJy6x+3fqJ1hGd6YoBDNMZqJhFR5et ELtoSpzgaH7i73NvhiVLYbmGBxc7dcbt0HAwYVDiw+xJ/HSeUtw/74NPyOe7fWe5ayk1Yq 6J20CXpvevcnMrk4r6ldEJPy55JnFwyvTSMUAMIk6sxBEKxsx1FUoaqXtHd0x/T4X8MgwE gei+X5/vMPX0Hp0rpypsV14ONmWsiJm3mvMbe3gSeL+ijnbfZKdioc9iSe02jw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1693176317; a=rsa-sha256; cv=none; b=Dzl8E31bkdAZCkClNbeLZWy9kSIvUmGsD6ZDf773tpynq5NfUr/Ee6+eMmOipqqGisMvb9 FefD8TfZKYloOzGXAb8COiHWANIFTZQVwswR7PtBkO9pQLGpm8CIVVoJJ0Ek6qdBaaR2jo bkFNAMKhKhOhSPaM9FZiFUaXkckau5jAIH60xyO5cgFQ4MLGHv6ReeoVAc10kJEPcIB6Ne MQdcih76VrYx36AgBGD7WbmKgsvzKhl5Q5vIWEy72G8Rl1PzvYVH2ZSTocWH3cmSBzDjJr El1+p20tWCfzvbO2OcglAWdlv5LBtcRXszHMabhaEfaM45XujllQFxI+2ipYHg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4RYpg10RNpzl9v; Sun, 27 Aug 2023 22:45:17 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 37RMjGXH003105; Sun, 27 Aug 2023 22:45:16 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 37RMjG6Y003102; Sun, 27 Aug 2023 22:45:16 GMT (envelope-from git) Date: Sun, 27 Aug 2023 22:45:16 GMT Message-Id: <202308272245.37RMjG6Y003102@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Mateusz Guzik Subject: git: a521cee3322f - stable/13 - vfs: try harder to find free vnodes when recycling List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: mjg X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: a521cee3322f979b1caade7ec000bf4f7509246b Auto-Submitted: auto-generated The branch stable/13 has been updated by mjg: URL: https://cgit.FreeBSD.org/src/commit/?id=a521cee3322f979b1caade7ec000bf4f7509246b commit a521cee3322f979b1caade7ec000bf4f7509246b Author: Mateusz Guzik AuthorDate: 2023-08-24 05:34:08 +0000 Commit: Mateusz Guzik CommitDate: 2023-08-27 22:44:12 +0000 vfs: try harder to find free vnodes when recycling The free vnode marker can slide past eligible entries. Artificially reducing vnode limit to 300k and spawning 104 workers each creating a million files results in all of them trying to recycle, which often fails when it should not have to. Because of the excessive traffic in this scenario, the trylock to requeue is virtually guaranteed to fail, meaning nothing gets pushed forward. Since no vnodes were found, the most unfortunate sleep for 1 second is induced (see vn_alloc_hard, the "vlruwk" msleep). Without the fix the machine is mostly idle with almost everyone stuck off CPU waiting for the sleep to finish. With the fix it is busy creating files. Unrelated to the above problem the marker could have landed in a similarly problematic spot for because of any failure in vtryrecycle. Originally reported as poudriere builders stalling in a vnode-count restricted setup. Fixes: 138a5dafba31 ("vfs: trylock vnode requeue") Reported by: Mark Millard (cherry picked from commit c1d85ac3df82df721e3d33b292579c4de491488e) --- sys/kern/vfs_subr.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index c1c474b6724d..484ad75b243e 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -195,6 +195,10 @@ static counter_u64_t recycles_free_count; SYSCTL_COUNTER_U64(_vfs, OID_AUTO, recycles_free, CTLFLAG_RD, &recycles_free_count, "Number of free vnodes recycled to meet vnode cache targets"); +static counter_u64_t vnode_skipped_requeues; +SYSCTL_COUNTER_U64(_vfs, OID_AUTO, vnode_skipped_requeues, CTLFLAG_RD, &vnode_skipped_requeues, + "Number of times LRU requeue was skipped due to lock contention"); + static u_long deferred_inact; SYSCTL_ULONG(_vfs, OID_AUTO, deferred_inact, CTLFLAG_RD, &deferred_inact, 0, "Number of times inactive processing was deferred"); @@ -724,6 +728,7 @@ vntblinit(void *dummy __unused) vnodes_created = counter_u64_alloc(M_WAITOK); recycles_count = counter_u64_alloc(M_WAITOK); recycles_free_count = counter_u64_alloc(M_WAITOK); + vnode_skipped_requeues = counter_u64_alloc(M_WAITOK); /* * Initialize the filesystem syncer. @@ -1268,11 +1273,13 @@ vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode *mvp) struct vnode *vp; struct mount *mp; int ocount; + bool retried; mtx_assert(&vnode_list_mtx, MA_OWNED); if (count > max_vnlru_free) count = max_vnlru_free; ocount = count; + retried = false; vp = mvp; for (;;) { if (count == 0) { @@ -1280,6 +1287,24 @@ vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode *mvp) } vp = TAILQ_NEXT(vp, v_vnodelist); if (__predict_false(vp == NULL)) { + /* + * The free vnode marker can be past eligible vnodes: + * 1. if vdbatch_process trylock failed + * 2. if vtryrecycle failed + * + * If so, start the scan from scratch. + */ + if (!retried && vnlru_read_freevnodes() > 0) { + TAILQ_REMOVE(&vnode_list, mvp, v_vnodelist); + TAILQ_INSERT_HEAD(&vnode_list, mvp, v_vnodelist); + vp = mvp; + retried++; + continue; + } + + /* + * Give up + */ TAILQ_REMOVE(&vnode_list, mvp, v_vnodelist); TAILQ_INSERT_TAIL(&vnode_list, mvp, v_vnodelist); break; @@ -3533,6 +3558,17 @@ vdbatch_process(struct vdbatch *vd) MPASS(curthread->td_pinned > 0); MPASS(vd->index == VDBATCH_SIZE); + /* + * Attempt to requeue the passed batch, but give up easily. + * + * Despite batching the mechanism is prone to transient *significant* + * lock contention, where vnode_list_mtx becomes the primary bottleneck + * if multiple CPUs get here (one real-world example is highly parallel + * do-nothing make , which will stat *tons* of vnodes). Since it is + * quasi-LRU (read: not that great even if fully honoured) just dodge + * the problem. Parties which don't like it are welcome to implement + * something better. + */ critical_enter(); if (mtx_trylock(&vnode_list_mtx)) { for (i = 0; i < VDBATCH_SIZE; i++) { @@ -3545,6 +3581,8 @@ vdbatch_process(struct vdbatch *vd) } mtx_unlock(&vnode_list_mtx); } else { + counter_u64_add(vnode_skipped_requeues, 1); + for (i = 0; i < VDBATCH_SIZE; i++) { vp = vd->tab[i]; vd->tab[i] = NULL;