Re: ... was killed: a thread waited too long to allocate a page [actually: was killed: failed to reclaim memory problem]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 01 Feb 2024 16:30:19 UTC
Karl Pielorz <kpielorz_lst_at_tdx.co.uk> wrote on
Date: Thu, 01 Feb 2024 14:47:44 UTC :

> --On 28 December 2023 11:38 +0200 Daniel Braniss <danny@cs.huji.ac.il> 
> wrote:
> 
> > hi,
> > I'm running 13.2 Stable on this particular host, which has about 200TB of
> > zfs storage the host also has some 132Gb of memory,
> > lately, mountd is getting killed:
> > kernel: pid 3212 (mountd), jid 0, uid 0, was killed: a thread waited
> > too long to allocate a page
> >
> > rpcinfo shows it's still there, but
> > service mountd restart
> > fails.
> >
> > only solution is to reboot.
> > BTW, the only 'heavy' stuff that I can see are several rsync
> > processes.
> 
> Hi,
> 
> I seem to have run into something similar. I recently upgraded a 12.4 box 
> to 13.2p9. The box has 32G of RAM, and runs ZFS. We do a lot of rsync work 
> to it monthly - the first month we've done this with 13.2p9 we get a lot of 
> processes killed, all with a similar (but not identical) message, e.g.
> 
> pid 11103 (ssh), jid 0, uid 0, was killed: failed to reclaim memory
> pid 10972 (local-unbound), jid 0, uid 59, was killed: failed to reclaim 
> memory
> pid 3223 (snmpd), jid 0, uid 0, was killed: failed to reclaim memory
> pid 3243 (mountd), jid 0, uid 0, was killed: failed to reclaim memory
> pid 3251 (nfsd), jid 0, uid 0, was killed: failed to reclaim memory
> pid 10996 (sshd), jid 0, uid 0, was killed: failed to reclaim memory
> pid 3257 (sendmail), jid 0, uid 0, was killed: failed to reclaim memory
> pid 8562 (csh), jid 0, uid 0, was killed: failed to reclaim memory
> pid 3363 (smartd), jid 0, uid 0, was killed: failed to reclaim memory
> pid 8558 (csh), jid 0, uid 0, was killed: failed to reclaim memory
> pid 3179 (ntpd), jid 0, uid 0, was killed: failed to reclaim memory
> pid 8555 (tcsh), jid 0, uid 1001, was killed: failed to reclaim memory
> pid 3260 (sendmail), jid 0, uid 25, was killed: failed to reclaim memory
> pid 2806 (devd), jid 0, uid 0, was killed: failed to reclaim memory
> pid 3156 (rpcbind), jid 0, uid 0, was killed: failed to reclaim memory
> pid 3252 (nfsd), jid 0, uid 0, was killed: failed to reclaim memory
> pid 3377 (getty), jid 0, uid 0, was killed: failed to reclaim memory
> 
> This 'looks' like 'out of RAM' type situation - but at the time, top showed:
> 
> last pid: 12622; load averages: 0.10, 0.24, 0.13 
> 
> 7 processes: 1 running, 6 sleeping
> CPU: 0.1% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.7% idle
> Mem: 4324K Active, 8856K Inact, 244K Laundry, 24G Wired, 648M Buf, 7430M 
> Free
> ARC: 20G Total, 8771M MFU, 10G MRU, 2432K Anon, 161M Header, 920M Other
> 15G Compressed, 23G Uncompressed, 1.59:1 Ratio
> Swap: 8192M Total, 5296K Used, 8187M Free
> 
> Rebooting it recovers it, and it completed the rsync after the reboot - 
> which left us with:
> 
> last pid: 12570; load averages: 0.07, 0.14, 0.17 
> up 0+00:15:06 14:43:56
> 26 processes: 1 running, 25 sleeping
> CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
> Mem: 39M Active, 5640K Inact, 17G Wired, 42M Buf, 14G Free
> ARC: 15G Total, 33M MFU, 15G MRU, 130K Anon, 32M Header, 138M Other
> 14G Compressed, 15G Uncompressed, 1.03:1 Ratio
> Swap: 8192M Total, 8192M Free
> 
> 
> I've not seen any bug reports along this line, in fact very little coverage 
> at all of the specific error.
> 
> My only thought is to set a sysctl to limit ZFS ARC usage, i.e. to leave 
> more free RAM floating around the system. During the rsync it was 
> 'swapping' occasionally (few K in, few K out) - but it never ran out of 
> swap that I saw - and it certainly didn't look like an complete out of 
> memory scenario/box (which is what it felt like with everything getting 
> killed).


One direction of control is . . .

What do you have for ( copied from my /boot/loader.conf ):

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

The default is 12 (last I knew, anyway).

The 120 figure has allowed me and others to do buildworld,
buildkernel, and poudriere bulk runs on small arm boards
using all cores that otherwise got "failed to reclaim
memory" (to use the modern, improved [not misleading]
message text). Similarly for others that had other kinds
of contexts that got the message.

(The units for the 120 are not time units: more like a
number of (re)tries to gain at least a target amount of
Free RAM before failure handling starts. The comment
wording is based on a consequence of the assignment.)

The 120 is not a maximum, just a figure that has proved
useful in various contexts.

But see the notes below based as well.

Notes:

"failed to reclaim memory" can happen even with swap
space enabled but no swap in use: sufficiently active
pages are just not paged out to swap space so if most
non-wired pages are classified as active, the kills
can start.

(There are some other parameters of possible use for some
other modern "was killed" reason texts.)

Wired pages are pages that can not be swapped out, even
if classified as inactive.

Your report indicates: 24G Wired with 20G of that being
from ARC use. This likely was after some processes had
already been killed. So likely more was wired and less
was free at the start of the kills.

That 24G+ of wired meant that only 8GiBytes- were
available everything else. Avoiding that by limiting
the ARC (tuning ZFS) or adjusting how the work load
is spread over time or some combination also looks
appropriate.

I've no clue why ARC use would be signifcantly
different for 12.4 vs. 13.2p9 .

===
Mark Millard
marklmi at yahoo.com