stable/13: ARC no longer self-tuning?
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 30 Mar 2022 13:07:48 UTC
Hi, while up to Rel 12 the ZFS ARC adjusted it's size to the demand, in Rel. 13 it appears to be locked to a fixed minimum of 100M compressed. Consequentially I just got a machine stall/freeze under moderate load: no cmdline reaction (except in the guests), no login possible, all processes in "D" state. Reset button needed, all guests and jails destroyed: 38378 - DJ 0:03.36 find -sx / /ext /var /usr/local /usr/ports /usr/obj 39414 - DJ 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail 39415 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39416 - DJ 0:00.00 /usr/local/www/cgit/cgit.cgi 39417 - D< 0:00.00 /usr/local/bin/ruby /ext/libexec/heatctl.rb (ruby27) 39418 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39419 - DJ 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail 39420 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39421 - DJ 0:00.00 sendmail: accepting connections (sendmail) 39426 - D 0:00.00 sendmail: running queue: /var/spool/mqueue (sendmail 39427 - D 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39428 - DJ 0:00.00 sendmail: Queue runner@00:03:00 for /var/spool/clien 39429 - DJ 0:00.00 sendmail: accepting connections (sendmail) 39430 - DJ 0:00.00 sendmail: running queue: /var/spool/clientmqueue (se 39465 - Ds 0:00.01 newsyslog 39466 - Ds 0:00.01 /bin/sh /usr/libexec/save-entropy 59365 - DsJ 0:00.09 /usr/sbin/cron -s "top", apparently the only process still running, shows this: last pid: 39657; load averages: 0.27, 1.24, 4.55 up 0+04:05:42 04:11:54 805 processes: 1 running, 804 sleeping CPU: 0.1% user, 0.0% nice, 0.9% system, 0.0% interrupt, 99.0% idle Mem: 16G Active, 5118M Inact, 1985M Laundry, 7144M Wired, 462M Buf, 905M Free ARC: 1417M Total, 326M MFU, 347M MRU, 8216K Anon, 30M Header, 706M Other 119M Compressed, 546M Uncompressed, 4.57:1 Ratio Swap: 36G Total, 995M Used, 35G Free, 2% Inuse, 76K In This is different to 12.3: there I would expect the ARC near 6G, wired near 11G, and swap near 5G. Last message in the log was 20 minutes earlier: Mar 30 03:45:17 <ntp.warn> edge ntpd[7768]: no peer for too long, server running free now So, strangely, networking has also stalled. I thought networking uses other device drivers separate from the disk drivers? The effect appeared slowly, machine became increasingly unresponsive and laggy (in all regards of I/O) during the "periodic daily". First night it runs find over a million files in all jails, as these are not yet in l2arc. Apparently this killed it: It might be related to the periodic daily running find in every jail: 35944 - DJ 0:04.71 find -sx / /var /ext /usr/local /usr/obj /usr/ports 36186 - DJ 0:04.75 find -sx / /var /usr/local /usr/obj /usr/ports /dev/ 37599 - DJ 0:04.14 find -sx / /var /ext /usr/local /ext/rapp /usr/ports 38378 - DJ 0:03.36 find -sx / /ext /var /usr/local /usr/ports /usr/obj ... This would need a *lot* of inodes, and the arc seems quite small for that. I've not seen such behaviour before - I had ZFS running in ~2007 with 384 MB ram installed; now here are 32G (which I wouldn't have bought, got them by accident), and that doesn't work well. The ARC is configured in loader.conf: # kenv vfs.zfs.arc_max="10240M" vfs.zfs.arc_min="1024M" However, sysctl shows: vfs.zfs.arc.max: 10737418240 vfs.zfs.arc.min: 0 Observing the behaviour, ARC wants to stay at or even below 1G: last pid: 38718; load averages: 2.12, 2.93, 2.88 up 0+01:09:08 05:30:25 625 processes: 1 running, 624 sleeping CPU: 0.0% user, 0.1% nice, 6.3% system, 0.0% interrupt, 93.6% idle Mem: 12G Active, 1433M Inact, 9987M Wired, 50M Buf, 8237M Free ARC: 749M Total, 116M MFU, 254M MRU, 2457K Anon, 42M Header, 334M Other 84M Compressed, 396M Uncompressed, 4.70:1 Ratio Swap: 36G Total, 36G Free There are 3 bhyve with 16G + 7G + 2G, these naturally create much dirty memory. The point is that these should go to swap, that's what SSD are for. The ARC only grows when there is not much activity on the system. That may be nice for desktops, but is no good for solid workload. I need it to grow against workload (which it did before, but now doesn't) and against paging (which not even appears). Do we have some new knobs to tune? This one is appears to already be zero by default: vfs.zfs.arc.grow_retry: 0 And what is this one doing? vfs.zfs.arc.p_dampener_disable=1 Do I need to read all the code? There are lots of other things that did work on 12.3 and now fail or crash, like net/dhcpcd (crashes now in libc), or mountd not understanding the zfs exports (syntax changed, doesn't match the manpage, didn't in 12.3 either, but differently), and I only have two eyes (and they don't get better with age). What would be needed for the ARC is an affinity balance: should it prefer to try and grow towards arc_max even with load (server use with well-configured arc_max), or should it shrink away as soon as some serious activity is on the system (gamers and bloated browsers use).