From nobody Thu Aug 24 15:21:59 2023 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RWmzB1GjQz4rH90 for ; Thu, 24 Aug 2023 15:22:14 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (tunnel82308-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "garrett.wollman.name", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RWmz85bJWz4LWv for ; Thu, 24 Aug 2023 15:22:12 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of wollman@hergotha.csail.mit.edu designates 2001:470:1f06:ccb::2 as permitted sender) smtp.mailfrom=wollman@hergotha.csail.mit.edu; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bimajority.org (policy=none) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.17.1/8.17.1) with ESMTPS id 37OFM2vk092867 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Thu, 24 Aug 2023 11:22:04 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.17.1/8.17.1/Submit) id 37OFM1vg092866; Thu, 24 Aug 2023 11:22:01 -0400 (EDT) (envelope-from wollman) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <25831.30103.446606.733311@hergotha.csail.mit.edu> Date: Thu, 24 Aug 2023 11:21:59 -0400 From: Garrett Wollman To: freebsd-stable@freebsd.org Subject: Re: Did something change with ZFS and vnode caching? In-Reply-To: <25827.33600.611577.665054@hergotha.csail.mit.edu> References: <25827.33600.611577.665054@hergotha.csail.mit.edu> X-Mailer: VM 8.2.0b under 28.2 (amd64-portbld-freebsd13.2) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.4 (hergotha.csail.mit.edu [0.0.0.0]); Thu, 24 Aug 2023 11:22:04 -0400 (EDT) X-Spam-Status: No, score=-4.6 required=5.0 tests=ALL_TRUSTED, HEADER_FROM_DIFFERENT_DOMAINS,NICE_REPLY_A,SARE_MILLIONSOF autolearn=disabled version=4.0.0 X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-14) on hergotha.csail.mit.edu X-Spamd-Result: default: False [-1.90 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FORGED_SENDER(0.30)[wollman@bimajority.org,wollman@hergotha.csail.mit.edu]; R_SPF_ALLOW(-0.20)[+ip6:2001:470:1f06:ccb::2]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[bimajority.org : SPF not aligned (relaxed), No valid DKIM,none]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-stable@freebsd.org]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; TO_DN_NONE(0.00)[]; FREEFALL_USER(0.00)[wollman]; ARC_NA(0.00)[]; FROM_NEQ_ENVFROM(0.00)[wollman@bimajority.org,wollman@hergotha.csail.mit.edu]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-Spamd-Bar: - X-Rspamd-Queue-Id: 4RWmz85bJWz4LWv Following up on what I asked about earlier this week: > As I've mentioned before, we have been upgrading our servers from 12.4 > to 13.2. Over the past week I've noticed on a number of our NFS > servers that our backups are running very slowly, taking much longer > than normal, with the `vnlru` process taking a whole CPU and load > average balloons to 40 or more. At the same time, NFS service becomes > extremely slow. Looking more closely at the configuration of our backup system, each of our 17 servers was running as many as 8 backups simultaneously, and each backup was using up to 150 threads. This tuning was done by our former backup vendor, who are unfortunately no longer in business, but they believed it to be necessary to complete scans of our filesystems within our scheduled overnight backup window. (Some of these filesystems contain billions of files and directories with millions of files each.) My current thinking is that 12.4 may have had a top-side bottleneck that prevented all those threads from doing very much work, and in 13.2 the bottleneck has moved deeper into the kernel. > A look at the vnode cache shows that it's at the limit, and > increasing `kern.maxvnodes` helps only for a few seconds, until the > vnode population reaches the new limit. I spent some time reading the code, and I added vnode population and recycling metrics to our monitoring, and what immediately stood out to me was that it's *not* running out of vnodes: instead, it has far too many free vnodes. Looking at one server right now: kern.maxvnodes: 2214323 vfs.numvnodes: 2214322 vfs.freevnodes: 2027790 vfs.wantfreevnodes: 553580 ...so the free list is almost four times too big, which maybe explains why vnlru_kick() is getting called, but not why it's not actually managing to completely destroy the excess when they are no longer needed. When backups are running, we can allocate 40,000 vnodes per second, almost all from the free list. Any suggestions on what we should monitor or try to adjust? -GAWollman