Re: Did something change with ZFS and vnode caching?

From: Garrett Wollman <wollman_at_bimajority.org>
Date: Thu, 24 Aug 2023 15:21:59 UTC
Following up on what I asked about earlier this week:

> As I've mentioned before, we have been upgrading our servers from 12.4
> to 13.2.  Over the past week I've noticed on a number of our NFS
> servers that our backups are running very slowly, taking much longer
> than normal, with the `vnlru` process taking a whole CPU and load
> average balloons to 40 or more.  At the same time, NFS service becomes
> extremely slow.

Looking more closely at the configuration of our backup system, each
of our 17 servers was running as many as 8 backups simultaneously, and
each backup was using up to 150 threads.  This tuning was done by our
former backup vendor, who are unfortunately no longer in business, but
they believed it to be necessary to complete scans of our filesystems
within our scheduled overnight backup window.  (Some of these
filesystems contain billions of files and directories with millions of
files each.)  My current thinking is that 12.4 may have had a top-side
bottleneck that prevented all those threads from doing very much work,
and in 13.2 the bottleneck has moved deeper into the kernel.

> A look at the vnode cache shows that it's at the limit, and
> increasing `kern.maxvnodes` helps only for a few seconds, until the
> vnode population reaches the new limit.

I spent some time reading the code, and I added vnode population and
recycling metrics to our monitoring, and what immediately stood out to
me was that it's *not* running out of vnodes: instead, it has far too
many free vnodes.  Looking at one server right now:

kern.maxvnodes: 2214323
vfs.numvnodes: 2214322
vfs.freevnodes: 2027790
vfs.wantfreevnodes: 553580

...so the free list is almost four times too big, which maybe explains
why vnlru_kick() is getting called, but not why it's not actually
managing to completely destroy the excess when they are no longer
needed.  When backups are running, we can allocate 40,000 vnodes per
second, almost all from the free list.

Any suggestions on what we should monitor or try to adjust?

-GAWollman