Re: Chasing OOM Issues - good sysctl metrics to use?

From: Jan Mikkelsen <janm_at_transactionware.com>
Date: Tue, 10 May 2022 15:47:06 UTC
On 10 May 2022, at 10:01, Mark Millard <marklmi@yahoo.com> wrote:
> 
> On 2022-Apr-29, at 13:57, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On 2022-Apr-29, at 13:41, Pete Wright <pete@nomadlogic.org> wrote:
>>> 
>>>> . . .
>>> 
>>> d'oh - went out for lunch and workstation locked up.  i *knew* i shouldn't have said anything lol.
>> 
>> Any interesting console messages ( or dmesg -a or /var/log/messages )?
>> 
> 
> I've been doing some testing of a patch by tijl at FreeBSD.org
> and have reproduced both hang-ups (ZFS/ARC context) and kills
> (UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim
> memory", both with and without the patch. This is with only a
> tiny fraction of the swap partition(s) enabled being put to
> use. So far, the testing was deliberately with
> vm.pageout_oom_seq=12 (the default value). My testing has been
> with main [so: 14].
> 
> But I also learned how to avoid the hang-ups that I got --but
> it costs making kills more likely/quicker, other things being
> equal.
> 
> I discovered that the hang-ups that I got were from all the
> processes that I interact with the system via ending up with
> the process's kernel threads swapped out and were not being
> swapped in. (including sshd, so no new ssh connections). In
> some contexts I only had escaping into the kernel debugger
> available, not even ^T would work. Other times ^T did work.
> 
> So, when I'm willing to risk kills in order to maintain
> the ability to interact normally, I now use in
> /etc/sysctl.conf :
> 
> vm.swap_enabled=0

I have been looking at an OOM related issue. Ignoring the actual leak, the problem leads to a process being killed because the system was out of memory. This is fine. After that, however, the system console was black with a single block cursor and the console keyboard was unresponsive. Caps lock and num lock didn’t toggle their lights when pressed.

Using an ssh session, the system looked fine. USB events for the keyboard being disconnected and reconnected appeared but the keyboard stayed unresponsive.

Setting vm.swap_enabled=0, as you did above, resolved this problem. After the process was killed a perfectly normal console returned.

The interesting thing is that this test system is configured with no swap space.

This is on 13.1-RC5.

> This disables swapping out of process kernel stacks. It
> is just with that option removedfor gaining free RAM, there
> fewer options tried before a kill is initiated. It is not a
> loader-time tunable but is writable, thus the
> /etc/sysctl.conf placement.

Is that really what it does? From a quick look at the code in vm/vm_swapout.c, it seems little more complex.

Regards,

Jan M.