Re: The pagedaemon evicts ARC before scanning the inactive page list

From: Alan Somers <asomers_at_freebsd.org>
Date: Wed, 19 May 2021 20:28:51 UTC
On Tue, May 18, 2021 at 10:17 PM Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Tue, May 18, 2021 at 09:55:25PM -0600, Alan Somers wrote:
> > On Tue, May 18, 2021 at 9:25 PM Konstantin Belousov <kostikbel@gmail.com
> >
> > > Is your machine ZFS-only?  If yes, then typical source of inactive
> memory
> > > can be of two kinds:
> > >
> >
> > No, there is also FUSE.  But there is typically < 1GB of Buf memory, so I
> > didn't mention it.
> As Mark mentioned, buffers use page cache as second-level cache. More
> precisely, there is relatively limited number of buffers in the system,
> which are just headers to describe a set of pages. When a buffer is
> recycled, its pages are put on inactive queue.
>
> This is why I asked is your machine ZFS-only or not, because io on
> bufcache-using filesystems typically add to the inactive queue.
>
> >
> >
> > > - anonymous memory that apps allocate with facilities like malloc(3).
> > >   If inactive is shrinkable then it is probably not, because dirty
> pages
> > >   from anon objects must go through laundry->swap route to get evicted,
> > >   and you did not mentioned swapping
> > >
> >
> > No, there's no appreciable amount of swapping going on.  Nor is the
> laundry
> > list typically more than a few hundred MB.
> >
> >
> > > - double-copy pages cached in v_objects of ZFS vnodes, clean or dirty.
> > >   If unmapped, these are mostly a waste.  Even if mapped, the source
> > >   of truth for data is ARC, AFAIU, so they can be dropped as well,
> since
> > >   inactive state means that its content is not hot.
> > >
> >
> > So if a process mmap()'s a file on ZFS and reads from it but never writes
> > to it, will those pages show up as inactive?
> It depends on workload, and it does not matter much if the pages are clean
> or dirty.  Right after mapping or under intense access pattern, they sit
> on the active list.  If not touched long enough, or cycled through the
> buffer cache for io (but ZFS pages not go through buffer cache), they
> are moved to inactive.
>
> >
> >
> > >
> > > You can try to inspect the most outstanding objects adding to the
> > > inactive queue with 'vmobject -o' to see where the most of inactive
> pages
> > > come from.
> > >
> >
> > Wow, that did it!  About 99% of the inactive pages come from just a few
> > vnodes which are used by the FUSE servers.  But I also see a few large
> > entries like
> > 1105308 333933 771375   1   0 WB  df
> > what does that signify?
> These are anonymous memory.
>
> >
> >
> > >
> > > If indeed they are double-copy, then perhaps ZFS can react even to the
> > > current primitive vm_lowmem signal somewhat different. First, it could
> > > do the pass over its vnodes and
> > > - free clean unmapped pages
> > > - if some targets are not met after that, laundry dirty pages,
> > >   then return to freeing clean unmapped pages
> > > all that before ever touching its cache (ARC).
> > >
>

Follow-up:
All of the big inactive-memory consumers were files on FUSE file systems
that were being exported as CTL LUNs.  ZFS files exported by CTL do not use
any res or inactive memory.  I didn't test UFS.  Curiously, removing the
LUN does not free the memory, but shutting down the FUSE daemon does.  A
valid workaround is to set the vfs.fusefs.data_cache_mode sysctl to 0.
That prevents the kernel from caching any data from the FUSE file system.
I've tested this on both FreeBSD 12.2 and 13.0 .  Should the kernel do a
better job of reclaiming inactive memory before ARC?  Yes, but in my case
it's better not to create so much inactive memory in the first place.
Thanks for everybody's help, especially kib's tip about "vmstat -o".
-Alan