From nobody Fri Dec 08 04:50:55 2023 X-Original-To: fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Smdxq4sGwz53HRQ for ; Fri, 8 Dec 2023 04:50:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Smdxq0Z9Kz4fcn for ; Fri, 8 Dec 2023 04:50:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1702011055; a=rsa-sha256; cv=none; b=QjZyiQZ8tyt0JZfxj8FeQH3fjam938OEEbRlsQeQZCtzGHZnveIdWKxrw+9zya8wNWD/8d QHO97MBCltlJj25P4rdyH2uuVO+jk1De4XaDBUlQKd2jER/Hl7+1wlwWbgU23nVQ2cUsp5 kmEN7ltBiFLQrKtS52QAeMDRuPiT1IQ6J92/GlK7EBMYgaw9H8yVPk7rp4ftN8HaSd9JOu m1u1k4QPNdwBU9CoF3OCe2BwK+N7ID+a1ea4EBrqy6+jDxJMsX1TjpJh8rupufFWkqfCtn iTqU1ug8a+lar6jAqLUzD1c7SuUqdhhZKDnHgGwVxy12rmL7vBq0wWwt/IF72w== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1702011055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7XgFIsbZ/ThJtJrCVyG1tbundztzruv4DyK5vZx1vcY=; b=GtfZrSigInKYz9c9gFYFNu9uQvDkKvi9QSv9BfVXDfn6AAqXt1gF+AkXIsuVYuhuGXvmp1 N6zqX4T74Lm+dRyUXzCuNI6kVc5vUvEkdtjhsqahSvfrcHglqYXy4uGROrFCZkUEsUmNrA CNevQbLVSVJWVB0R9R9UHdbef69jXtL+IXByPJCqB0TUhUN72Uj3xyVPOjnOoUlbnLGu8m 6aiq71i8f8WCVvGi3AKtmgbleTTEHy21UmQuYw6QJxP2SuG/iuzWi3a9icVOFVcxkDpZMr 88t26gLhj+SAz3XhTUczmKLFmMFb/gVKkpyF3ZGKaGkQ4qgnyBLV7sLCHLAaMA== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Smdxp6kwjz165B for ; Fri, 8 Dec 2023 04:50:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 3B84osh4062758 for ; Fri, 8 Dec 2023 04:50:54 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 3B84osHm062757 for fs@FreeBSD.org; Fri, 8 Dec 2023 04:50:54 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 275594] High CPU usage by arc_prune; analysis and fix Date: Fri, 08 Dec 2023 04:50:55 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 14.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: seigo.tanimura@gmail.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275594 --- Comment #5 from Seigo Tanimura --- (In reply to Mark Johnston from comment #3) The build has completed. Build time: 07:40:56 (278 pkgs / hr) arc_prune stopped shortly after poudriere finished. The pileup of arc_prune has indeed been fixed by FreeBSD-EN-23:18.openzfs, but the essential problem should be in somewhere else. Right now, I am testing with the following setup after the reboot: - vfs.vnode.vnlru.max_free_per_call: 10000 (out-of-box) - vfs.zfs.arc.prune_interval: 1000 (my fix enabled) About 2 hours after the start, the CPU usage of arc_prune was at 20 - 25% w= ith the occasional drops. Poudriere was working on lang/rust and lang/gcc12 at that time. A correction of the description: > * Test Environment: VM & OS > - RAM: 20 GB (not 16 GB) A note on the ZFS configuration: > vfs.zfs.arc_max=3D4294967296 (4GiB) This limit has been added because this host is a build server, not a file server. AFAIK, ZFS tends to take up to about 1/4 of the available RAM for = the ARC. While that may be fair as a file server, an application server wants = more RAM in general. Under the limit above, the demand upon the ARC pruning is expected and the = OS must be ready to deal with that. > arc_prune_async() is rather dumb on FreeBSD, as you point out: it tries t= o reclaim vnodes from the global free list, but doing so might not alleviat= e pressure. Really we want some way to shrink a per-mountpoint or per-file= system cache. I thought you would say that; I almost thought of the same thing more than = 20 years ago while implementing the initial version of vnlru along with Matt Dillon :) The per-mountpoint / per-filesystem vnode design has at least two challenge= s: A) Balancing the vnodes across the mountpoints / filesystems, and B) Splitting the name cache. I suspect B) is the more difficult one. As of now, the global name cache allows the vnode lookup in a single place with just one pass. The behaviour and performance under the per-mountpoint / per-filesystem name cache would depend on the interaction across multiple filesystems, and hence be very complicated to analyse and tune. The interval between the ARC pruning executions is much more simple and yet effective, under my key findings out of the first test in the description: - The ARC pruning indeed works as long as that is a one-shot run. - The modern hardware is fast enough to walk through all vnodes, again as l= ong as that is a one-shot run. - The ARC pruning and vnlru are the vnode maintainers, not the users. They must guarantee the fairness upon the vnode use to the true vnode users, nam= ely the user processes and threads. (and maybe the NFS server threads for a net= work file server) After the current build, I will try vfs.vnode.vnlru.max_free_per_call=3D400= 0000.=20 This value is the same as vfs.vnode.param.limit, so there will be no limit = upon the ARC pruning workload except for the giveup condition. --=20 You are receiving this mail because: You are the assignee for the bug.=