From nobody Tue May 18 22:10:42 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id DFCC35CF2B2 for ; Tue, 18 May 2021 22:10:41 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fl9Dd5qgsz4T7n; Tue, 18 May 2021 22:10:41 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qt1-x832.google.com with SMTP id a10so1948816qtp.7; Tue, 18 May 2021 15:10:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=4XtDJw8cwNBPQQgaADLceharUFwKbgAuyN4GZzyfsuY=; b=b8v9Ls18/Dfig+17NI6oTlQ3cIVzVIVcmJor+OQ29R2GzsBLvEFYkFhMYafgJ57LVx X6Z7xSRr+zz4frd5WnNe92Fl42nSHsezjmcjFjj93MbTqMLJw8RmCigLZ7WgxWvu1MIM BZB1F9W6aUKWYqgy8KNCS0drR2PUaq6HdV9pQUOIxy6dS2NsBIw2+BCrSN5ZnzjU7pus gCN5zuIom+MdkEG1AQPYig1mODpfXkSLZK8UhlZtg6MPOGcfZ8N8S90X0RKiSwdfRa/L JLQ5EVvXvBORU/0rSdtwyw/UfdCiOMbE+5NBm1Hlvl4x6VRLYaKJ7PgP4qHF22iI6ZSd FXbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=4XtDJw8cwNBPQQgaADLceharUFwKbgAuyN4GZzyfsuY=; b=i0tQ/tt38+TF459kvHJtgW6HzYpCbQkPhgwmS30DfhXEuOpfKZP7QxlB745WhLLcj9 IJv+wdBdMX2vB/dSWXo+14qDQmmjonsHaH2vRaH0WCA5teoNaxJ6tV3DOfMQt9bFwRKj l1//AmV46N1EpqjdL5WAqaY1fp76H6S9y/0Le+v45ZelruPGAU58YCgEoHg69PGxEZOH mpn6TACiBqRxEvwyMzj/9DkeSI1hSa4qfFPaZZB4RvmQFg50QohcoJmFnXb4lNz0Cy8E Jy5dXLatydUCNUu1WC5MQcq0HsNV0lq4KRK+JeHGgd3QTEPfwGC8wE6qF0OYzVQPIccs aFsQ== X-Gm-Message-State: AOAM530gricFQEWChZ9Kb9UpLHRGbYG0zD5Opx5DtaA+UE0Ycyti1+tx GPOlHFK5ckuYWac/IO36wd0ROw5efA7AgQ== X-Google-Smtp-Source: ABdhPJyPRElKJjkpya/qBWaOKmTXYo0XBKOgKQ/xWH3PKBbzOxNPii17pLdn5fcbT5q9SyDbsYvNOQ== X-Received: by 2002:ac8:44b1:: with SMTP id a17mr7282933qto.369.1621375840495; Tue, 18 May 2021 15:10:40 -0700 (PDT) Received: from nuc ([142.126.159.38]) by smtp.gmail.com with ESMTPSA id u27sm13977364qku.33.2021.05.18.15.10.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 May 2021 15:10:39 -0700 (PDT) Sender: Mark Johnston Date: Tue, 18 May 2021 18:10:42 -0400 From: Mark Johnston To: Alan Somers Cc: FreeBSD Hackers Subject: Re: The pagedaemon evicts ARC before scanning the inactive page list Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4Fl9Dd5qgsz4T7n X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] On Tue, May 18, 2021 at 04:00:14PM -0600, Alan Somers wrote: > On Tue, May 18, 2021 at 3:45 PM Mark Johnston wrote: > > > On Tue, May 18, 2021 at 03:07:44PM -0600, Alan Somers wrote: > > > I'm using ZFS on servers with tons of RAM and running FreeBSD > > > 12.2-RELEASE. Sometimes they get into a pathological situation where > > most > > > of that RAM sits unused. For example, right now one of them has: > > > > > > 2 GB Active > > > 529 GB Inactive > > > 16 GB Free > > > 99 GB ARC total > > > 469 GB ARC max > > > 86 GB ARC target > > > > > > When a server gets into this situation, it stays there for days, with the > > > ARC target barely budging. All that inactive memory never gets reclaimed > > > and put to a good use. Frequently the server never recovers until a > > reboot. > > > > > > I have a theory for what's going on. Ever since r334508^ the pagedaemon > > > sends the vm_lowmem event _before_ it scans the inactive page list. If > > the > > > ARC frees enough memory, then vm_pageout_scan_inactive won't need to free > > > any. Is that order really correct? For reference, here's the relevant > > > code, from vm_pageout_worker: > > > > That was the case even before r334508. Note that prior to that revision > > vm_pageout_scan_inactive() would trigger vm_lowmem if pass > 0, before > > scanning the inactive queue. During a memory shortage we have pass > 0. > > pass == 0 only when the page daemon is scanning the active queue. > > > > > shortage = pidctrl_daemon(&vmd->vmd_pid, vmd->vmd_free_count); > > > if (shortage > 0) { > > > ofree = vmd->vmd_free_count; > > > if (vm_pageout_lowmem() && vmd->vmd_free_count > ofree) > > > shortage -= min(vmd->vmd_free_count - ofree, > > > (u_int)shortage); > > > target_met = vm_pageout_scan_inactive(vmd, shortage, > > > &addl_shortage); > > > } else > > > addl_shortage = 0 > > > > > > Raising vfs.zfs.arc_min seems to workaround the problem. But ideally > > that > > > wouldn't be necessary. > > > > vm_lowmem is too primitive: it doesn't tell subscribing subsystems > > anything about the magnitude of the shortage. At the same time, the VM > > doesn't know much about how much memory they are consuming. A better > > strategy, at least for the ARC, would be reclaim memory based on the > > relative memory consumption of each subsystem. In your case, when the > > page daemon goes to reclaim memory, it should use the inactive queue to > > make up ~85% of the shortfall and reclaim the rest from the ARC. Even > > better would be if the ARC could use the page cache as a second-level > > cache, like the buffer cache does. > > > > Today I believe the ARC treats vm_lowmem as a signal to shed some > > arbitrary fraction of evictable data. If the ARC is able to quickly > > answer the question, "how much memory can I release if asked?", then > > the page daemon could use that to determine how much of its reclamation > > target should come from the ARC vs. the page cache. > > > > I guess I don't understand why you would ever free from the ARC rather than > from the inactive list. When is inactive memory ever useful? Pages in the inactive queue are either unmapped or haven't had their mappings referenced recently. But they may still be frequently accessed by file I/O operations like sendfile(2). That's not to say that reclaiming from other subsystems first is always the right strategy, but note also that the page daemon may scan the inactive queue many times in between vm_lowmem calls.