Re: Chasing OOM Issues - good sysctl metrics to use?

Reply: Mark Millard : "Re: Chasing OOM Issues - good sysctl metrics to use?"
In reply to: Mark Millard : "Re: Chasing OOM Issues - good sysctl metrics to use?"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Mark Millard <marklmi_at_yahoo.com>
Date: Wed, 11 May 2022 03:31:35 UTC
On 2022-May-10, at 17:49, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-May-10, at 11:49, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On 2022-May-10, at 08:47, Jan Mikkelsen <janm@transactionware.com> wrote:
>> 
>>> On 10 May 2022, at 10:01, Mark Millard <marklmi@yahoo.com> wrote:
>>>> 
>>>> On 2022-Apr-29, at 13:57, Mark Millard <marklmi@yahoo.com> wrote:
>>>> 
>>>>> On 2022-Apr-29, at 13:41, Pete Wright <pete@nomadlogic.org> wrote:
>>>>>> 
>>>>>>> . . .
>>>>>> 
>>>>>> d'oh - went out for lunch and workstation locked up.  i *knew* i shouldn't have said anything lol.
>>>>> 
>>>>> Any interesting console messages ( or dmesg -a or /var/log/messages )?
>>>>> 
>>>> 
>>>> I've been doing some testing of a patch by tijl at FreeBSD.org
>>>> and have reproduced both hang-ups (ZFS/ARC context) and kills
>>>> (UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim
>>>> memory", both with and without the patch. This is with only a
>>>> tiny fraction of the swap partition(s) enabled being put to
>>>> use. So far, the testing was deliberately with
>>>> vm.pageout_oom_seq=12 (the default value). My testing has been
>>>> with main [so: 14].
>>>> 
>>>> But I also learned how to avoid the hang-ups that I got --but
>>>> it costs making kills more likely/quicker, other things being
>>>> equal.
>>>> 
>>>> I discovered that the hang-ups that I got were from all the
>>>> processes that I interact with the system via ending up with
>>>> the process's kernel threads swapped out and were not being
>>>> swapped in. (including sshd, so no new ssh connections). In
>>>> some contexts I only had escaping into the kernel debugger
>>>> available, not even ^T would work. Other times ^T did work.
>>>> 
>>>> So, when I'm willing to risk kills in order to maintain
>>>> the ability to interact normally, I now use in
>>>> /etc/sysctl.conf :
>>>> 
>>>> vm.swap_enabled=0
>>> 
>>> I have been looking at an OOM related issue. Ignoring the actual leak, the problem leads to a process being killed because the system was out of memory. This is fine. After that, however, the system console was black with a single block cursor and the console keyboard was unresponsive. Caps lock and num lock didn’t toggle their lights when pressed.
>>> 
>>> Using an ssh session, the system looked fine. USB events for the keyboard being disconnected and reconnected appeared but the keyboard stayed unresponsive.
>>> 
>>> Setting vm.swap_enabled=0, as you did above, resolved this problem. After the process was killed a perfectly normal console returned.
>>> 
>>> The interesting thing is that this test system is configured with no swap space.
>>> 
>>> This is on 13.1-RC5.
>>> 
>>>> This disables swapping out of process kernel stacks. It
>>>> is just with that option removedfor gaining free RAM, there
>>>> fewer options tried before a kill is initiated. It is not a
>>>> loader-time tunable but is writable, thus the
>>>> /etc/sysctl.conf placement.
>>> 
>>> Is that really what it does? From a quick look at the code in vm/vm_swapout.c, it seems little more complex.
>> 
>> I was going by its description:
>> 
>> # sysctl -d vm.swap_enabled
>> vm.swap_enabled: Enable entire process swapout
>> 
>> Based on the below, it appears that the description
>> presumes vm.swap_idle_enabled==0 (the default). In
>> my context vm.swap_idle_enabled==0 . Looks like I
>> should also list:
>> 
>> vm.swap_idle_enabled=0
>> 
>> in my /etc/sysctl.conf with a reminder comment that the
>> pair of =0's are required for avoiding the observed
>> hang-ups.
>> 
>> 
>> The  analysis goes like . . .
>> 
>> I see in the code that vm.swap_enabled !=0 causes
>> VM_SWAP_NORMAL :
>> 
>> void
>> vm_swapout_run(void)
>> {
>> 
>>       if (vm_swap_enabled)
>>               vm_req_vmdaemon(VM_SWAP_NORMAL);
>> }
>> 
>> and that in turn leads to vm_daemon to:
>> 
>>               if (swapout_flags != 0) {
>>                       /*
>>                        * Drain the per-CPU page queue batches as a deadlock
>>                        * avoidance measure.
>>                        */
>>                       if ((swapout_flags & VM_SWAP_NORMAL) != 0)
>>                               vm_page_pqbatch_drain();
>>                       swapout_procs(swapout_flags);
>>               }
>> 
>> Note: vm.swap_idle_enabled==0 && vm.swap_enabled==0 ends
>> up with swapout_flags==0. vm.swap_idle. . . defaults seem
>> to be (in my context):
>> 
>> # sysctl -a | grep swap_idle
>> vm.swap_idle_threshold2: 10
>> vm.swap_idle_threshold1: 2
>> vm.swap_idle_enabled: 0
>> 
>> For reference:
>> 
>> /*
>> * Idle process swapout -- run once per second when pagedaemons are
>> * reclaiming pages.
>> */
>> void
>> vm_swapout_run_idle(void)
>> {
>>       static long lsec;
>> 
>>       if (!vm_swap_idle_enabled || time_second == lsec)
>>               return;
>>       vm_req_vmdaemon(VM_SWAP_IDLE);
>>       lsec = time_second;
>> }
>> 
>> [So vm.swap_idle_enabled==0 avoids VM_SWAP_IDLE status.]
>> 
>> static void
>> vm_req_vmdaemon(int req)
>> {
>>       static int lastrun = 0;
>> 
>>       mtx_lock(&vm_daemon_mtx);
>>       vm_pageout_req_swapout |= req;
>>       if ((ticks > (lastrun + hz)) || (ticks < lastrun)) {
>>               wakeup(&vm_daemon_needed);
>>               lastrun = ticks;
>>       }
>>       mtx_unlock(&vm_daemon_mtx);
>> }
>> 
>> [So VM_SWAP_IDLE and VM_SWAP_NORMAL are independent bits
>> in vm_pageout_req_swapout.]
>> 
>> vm_deamon does:
>> 
>>               mtx_lock(&vm_daemon_mtx);
>>               msleep(&vm_daemon_needed, &vm_daemon_mtx, PPAUSE, "psleep",
>>                   vm_daemon_timeout);
>>               swapout_flags = vm_pageout_req_swapout;
>>               vm_pageout_req_swapout = 0;
>>               mtx_unlock(&vm_daemon_mtx);
>> 
>> So vm_pageout_req_swapout is regenerated after thata
>> each time.
>> 
>> I'll not show the code for vm.swap_idle_enabled!=0 .
>> 
> 
> Well, with continued experiments I got an example of
> a hangup for which looking via the db> prompt did not
> show any swapping out of process kernel stacks
> ( vm.swap_enabled=0 was the context, so expected ).
> The environment was ZFS (so with ARC).
> 
> But this was testing with vm.pageout_oom_seq=120 instead
> of the default vm.pageout_oom_seq=12 . It may be that
> let sit long enough things would have unhung (external
> perspective).
> 
> It is part of what I'm experimenting with so we will see.
> 

Looks like I might have overreacted, in that for my
current tests there can be brief periods of delayed
response, but things respond in a little bit.
Definately not like the hang-ups I was getting with
vm.swap_enabled=1 .

===
Mark Millard
marklmi at yahoo.com