Re: Unable to limit memory consumption with vfs.zfs.arc_max

From: Jim Long <freebsd-questions_at_umpquanet.com>
Date: Mon, 15 Jul 2024 20:41:15 UTC
As Bugs Bunny often said, "What a maroon!"

Here's the attached MRTG graph.



On Mon, Jul 15, 2024 at 01:24:37PM -0700, Jim Long wrote:
> Picking up this old thread since it's still vexing me....
> 
> On Sat, May 04, 2024 at 07:56:39AM -0400, Dan Langille wrote:
> > 
> > This is from FreeBSD 14 on an Dell R730 in the basement (primary purpose, poudriere, and PostgreSQL, and running four FreshPorts nodes):
> > 
> > >From top:
> > 
> > ARC: 34G Total, 14G MFU, 9963M MRU, 22M Anon, 1043M Header, 9268M Other
> >      18G Compressed, 41G Uncompressed, 2.28:1 Ratio
> > 
> > % grep arc /boot/loader.conf
> > vfs.zfs.arc_max="36000M"
> > 
> > Looks like the value to set is:
> > 
> > % sysctl -a vfs.zfs.arc | grep max
> > vfs.zfs.arc.max: 37748736000
> > 
> > Perhaps not a good example, but this might be more appropriate:
> > 
> > % grep vfs.zfs.arc.max /boot/loader.conf
> > vfs.zfs.arc_max="1200M"
> > 
> > with top showing:
> > 
> > ARC: 1198M Total, 664M MFU, 117M MRU, 3141K Anon, 36M Header, 371M Other
> >      550M Compressed, 1855M Uncompressed, 3.37:1 Ratio
> 
> Thank you, Dan, I appreciate you chiming in.
> 
> Unfortunately, I think I have those bases covered, although I'm open to
> anything I may have missed:
> 
> # grep -i arc /boot/loader.conf /etc/sysctl.conf 
> /boot/loader.conf:vfs.zfs.arc.max=4294967296
> /boot/loader.conf:vfs.zfs.arc_max=4294967296
> /boot/loader.conf:vfs.zfs.arc.min=2147483648
> /etc/sysctl.conf:vfs.zfs.arc_max=4294967296
> /etc/sysctl.conf:vfs.zfs.arc.max=4294967296
> /etc/sysctl.conf:vfs.zfs.arc.min=2147483648
> 
> # top -b
> last pid: 16257;  load averages:  0.80,  1.15,  1.18  up 0+02:03:34    12:05:06
> 55 processes:  2 running, 53 sleeping
> CPU: 11.7% user,  0.0% nice, 18.4% system,  0.1% interrupt, 69.9% idle
> Mem: 32M Active, 141M Inact, 11G Wired, 3958M Free
> ARC: 10G Total, 5143M MFU, 4679M MRU, 2304K Anon, 44M Header, 219M Other
>      421M Compressed, 4744M Uncompressed, 11.28:1 Ratio
> 
>   PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
> 11057 root          1 127    0    59M    33M CPU0     0  60:16  82.28% ssh
> 11056 root          5  24    0    22M    12M pipewr   3   6:00   6.25% zfs
>  1619 snmpd         1  20    0    34M    14M select   0   0:06   0.00% snmpd
>  1344 root          1  20    0    14M  3884K select   3   0:03   0.00% devd
>  1544 root          1  20    0    13M  2776K select   3   0:01   0.00% syslogd
>  1661 root          1  68    0    22M  9996K select   0   0:01   0.00% sshd
>  1587 ntpd          1  20    0    23M  5876K select   1   0:00   0.00% ntpd
> 14391 root          1  20    0    22M    11M select   3   0:00   0.00% sshd
>  2098 root          1  20    0    24M    11M select   1   0:00   0.00% httpd
>  1904 root          1  20    0    24M    11M select   2   0:00   0.00% httpd
>  1870 root          1  20    0    19M  8688K select   2   0:00   0.00% sendmail
>  2067 root          1  20    0    19M  8688K select   1   0:00   0.00% sendmail
>  2066  65529        1  20    0    13M  4564K select   2   0:00   0.00% mathlm
>  1883  65529        1  20    0    11M  2772K select   3   0:00   0.00% mathlm
> 14397 root          1  20    0    14M  4568K wait     1   0:00   0.00% bash
>  1636 root          1  20    0    13M  2608K nanslp   0   0:00   0.00% cron
>  2082 root          1  20    0    13M  2560K nanslp   3   0:00   0.00% cron
>  1887 root          1  20    0    13M  2568K nanslp   2   0:00   0.00% cron
> 
> # sysctl -a | grep m.u_evictable
> kstat.zfs.misc.arcstats.mfu_evictable_metadata: 0
> kstat.zfs.misc.arcstats.mfu_evictable_data: 0
> kstat.zfs.misc.arcstats.mru_evictable_metadata: 0
> kstat.zfs.misc.arcstats.mru_evictable_data: 0
> 
> An mrtg graph is attached showing ARC bytes used
> (kstat.zfs.misc.arcstats.size) in green, vs. ARC bytes max
> (vfs.zfs.arc.max) in blue.  We can see that daily, the ARC bytes used
> blows right past the 4G limit.  Most days, it is brought under control
> by two reboots in /etc/crontab ("shutdown -r now" at 02:55, 05:35),
> although some days the system is too far gone by the time the cron job
> rolls around, and the system stays hung until I can get to the data
> center and power cycle it.
> 
> I'm not very skilled at kernel debugging, but is a kernel PR in order?
> This has happened with a GENERIC kernel across at least two builds of
> 14-STABLE:
> 
> FreeBSD 14.0-STABLE #0 stable/14-n267062-77205dbc1397: Thu Mar 28 12:12:02 PDT 2024
> FreeBSD 14.1-STABLE #0 stable/14-n267886-4987c12cb878: Thu Jun  6 12:24:06 PDT 2024
> 
> Would it help to reproduce this with a -RELEASE version?
> 
> 
> Thank you again, everyone.
> 
> Jim