Re: The pagedaemon evicts ARC before scanning the inactive page list
- Reply: Rozhuk Ivan : "Re: The pagedaemon evicts ARC before scanning the inactive page list"
- Reply: Konstantin Belousov : "Re: The pagedaemon evicts ARC before scanning the inactive page list"
- In reply to: Mark Johnston : "Re: The pagedaemon evicts ARC before scanning the inactive page list"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 18 May 2021 23:55:36 UTC
On Tue, May 18, 2021 at 4:10 PM Mark Johnston <markj@freebsd.org> wrote: > On Tue, May 18, 2021 at 04:00:14PM -0600, Alan Somers wrote: > > On Tue, May 18, 2021 at 3:45 PM Mark Johnston <markj@freebsd.org> wrote: > > > > > On Tue, May 18, 2021 at 03:07:44PM -0600, Alan Somers wrote: > > > > I'm using ZFS on servers with tons of RAM and running FreeBSD > > > > 12.2-RELEASE. Sometimes they get into a pathological situation where > > > most > > > > of that RAM sits unused. For example, right now one of them has: > > > > > > > > 2 GB Active > > > > 529 GB Inactive > > > > 16 GB Free > > > > 99 GB ARC total > > > > 469 GB ARC max > > > > 86 GB ARC target > > > > > > > > When a server gets into this situation, it stays there for days, > with the > > > > ARC target barely budging. All that inactive memory never gets > reclaimed > > > > and put to a good use. Frequently the server never recovers until a > > > reboot. > > > > > > > > I have a theory for what's going on. Ever since r334508^ the > pagedaemon > > > > sends the vm_lowmem event _before_ it scans the inactive page list. > If > > > the > > > > ARC frees enough memory, then vm_pageout_scan_inactive won't need to > free > > > > any. Is that order really correct? For reference, here's the > relevant > > > > code, from vm_pageout_worker: > > > > > > That was the case even before r334508. Note that prior to that > revision > > > vm_pageout_scan_inactive() would trigger vm_lowmem if pass > 0, before > > > scanning the inactive queue. During a memory shortage we have pass > > 0. > > > pass == 0 only when the page daemon is scanning the active queue. > > > > > > > shortage = pidctrl_daemon(&vmd->vmd_pid, vmd->vmd_free_count); > > > > if (shortage > 0) { > > > > ofree = vmd->vmd_free_count; > > > > if (vm_pageout_lowmem() && vmd->vmd_free_count > ofree) > > > > shortage -= min(vmd->vmd_free_count - ofree, > > > > (u_int)shortage); > > > > target_met = vm_pageout_scan_inactive(vmd, shortage, > > > > &addl_shortage); > > > > } else > > > > addl_shortage = 0 > > > > > > > > Raising vfs.zfs.arc_min seems to workaround the problem. But ideally > > > that > > > > wouldn't be necessary. > > > > > > vm_lowmem is too primitive: it doesn't tell subscribing subsystems > > > anything about the magnitude of the shortage. At the same time, the VM > > > doesn't know much about how much memory they are consuming. A better > > > strategy, at least for the ARC, would be reclaim memory based on the > > > relative memory consumption of each subsystem. In your case, when the > > > page daemon goes to reclaim memory, it should use the inactive queue to > > > make up ~85% of the shortfall and reclaim the rest from the ARC. Even > > > better would be if the ARC could use the page cache as a second-level > > > cache, like the buffer cache does. > > > > > > Today I believe the ARC treats vm_lowmem as a signal to shed some > > > arbitrary fraction of evictable data. If the ARC is able to quickly > > > answer the question, "how much memory can I release if asked?", then > > > the page daemon could use that to determine how much of its reclamation > > > target should come from the ARC vs. the page cache. > > > > > > > I guess I don't understand why you would ever free from the ARC rather > than > > from the inactive list. When is inactive memory ever useful? > > Pages in the inactive queue are either unmapped or haven't had their > mappings referenced recently. But they may still be frequently accessed > by file I/O operations like sendfile(2). That's not to say that > reclaiming from other subsystems first is always the right strategy, but > note also that the page daemon may scan the inactive queue many times in > between vm_lowmem calls. > So By default ZFS tries to free (arc_target / 128) bytes of memory in arc_lowmem. That's huge! On this server, pidctrl_daemon typically requests 0-10MB, and arc_lowmem tries to free 600 MB. It looks like it would be easy to modify vm_lowmem to include the total amount of memory that it wants freed. I could make such a patch. My next question is: what's the fastest way to generate a lot of inactive memory? My first attempt was "find . | xargs md5", but that isn't terribly effective. The production machines are doing a lot of "zfs recv" and running some busy Go programs, among other things, but I can't easily replicate that workload on a development system. -Alan