Re: Chasing OOM Issues - good sysctl metrics to use?

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 14 May 2022 08:09:30 UTC
Pete Wright <pete_at_nomadlogic.org> wrote on
Date: Fri, 13 May 2022 13:43:11 -0700 :

> On 5/11/22 12:52, Mark Millard wrote:
> >
> >
> > Relative to avoiding hang-ups, so far it seems that
> > use of vm.swap_enabled=0 with vm.swap_idle_enabled=0
> > makes hang-ups less likely/less frequent/harder to
> > produce examples of. But is no guarantee of lack of
> > a hang-up. Its does change the cause of the hang-up
> > (in that it avoids processes with kernel stacks swapped
> > out being involved).
> 
> thanks for the above analysis Mark.  i am going to test these settings 
> out now as i'm still seeing the lockup.
> 
> this most recent hang-up was using a patch tijl_at_ asked me to test 
> (attached to this email), and the default setting of vm.pageout_oom_seq: 
> 12.

I also had been run various tests for tijl_at_ , the same
sort of 'removal of the " + 1" patch'. I had found a basic
way to tell if a fundamental problem was completely
avoided or not, without having to wait long periods of
activity to do so. But that does not mean the test is a
good simulation of your context's sequence that leads to
issues. Nor does it indicate how wide a range of activity
is fairly likely to reach the failing conditions.

You could see how vm.pageout_oom_seq=120 does for you with
the patch. I was never patient enough to wait long enough
for this to OOM kill or hang-up in my test context.

I've been reporting the likes of:

# sysctl vm.domain.0.stats # done after the fact
vm.domain.0.stats.inactive_pps: 1037
vm.domain.0.stats.free_severe: 15566
vm.domain.0.stats.free_min: 25759
vm.domain.0.stats.free_reserved: 5374
vm.domain.0.stats.free_target: 86914
vm.domain.0.stats.inactive_target: 130371
vm.domain.0.stats.unswppdpgs: 0
vm.domain.0.stats.unswappable: 0
vm.domain.0.stats.laundpdpgs: 858845
vm.domain.0.stats.laundry: 9
vm.domain.0.stats.inactpdpgs: 1040939
vm.domain.0.stats.inactive: 1063
vm.domain.0.stats.actpdpgs: 407937767
vm.domain.0.stats.active: 1032
vm.domain.0.stats.free_count: 3252526

But I also have a kernel that reports just before
the call that is to cause a OOM kill, ending up
with output like:

vm_pageout_mightbe_oom: kill context: v_free_count: 15306, v_inactive_count: 1, v_laundry_count: 64, v_active_count: 3891599
May 11 00:44:11 CA72_Mbin_ZFS kernel: pid 844 (stress), jid 0, uid 0, was killed: failed to reclaim memory

(I was testing main [so: 14].) So I report that as well.

Since I was using stress as part of my test context, there
were also lines like:

stress: FAIL: [843] (415) <-- worker 844 got signal 9
stress: WARN: [843] (417) now reaping child worker processes
stress: FAIL: [843] (451) failed run completed in 119s

(tijl_at_ had me add v_laundry_count and v_active_count
to what I've had carried forward since back in 2018 when
Mark J. provided the original extra message.)

Turns out the kernel debugger (db> prompt) can report the
same general sort of figures:

db> show page
vm_cnt.v_free_count: 15577
vm_cnt.v_inactive_count: 1
vm_cnt.v_active_count: 3788852
vm_cnt.v_laundry_count: 0
vm_cnt.v_wire_count: 272395
vm_cnt.v_free_reserved: 5374
vm_cnt.v_free_min: 25759
vm_cnt.v_free_target: 86914
vm_cnt.v_inactive_target: 130371

db> show pageq
pq_free 15577
dom 0 page_cnt 4077116 free 15577 pq_act 3788852 pq_inact 1 pq_laund 0 pq_unsw 0

(Note: pq_unsw is a non-swappable count that excludes
the wired count, apparently matching
vm.domain.0.stats.unswappable .)

The above is the most extremely small pq_inact+pq_laund that
I saw at the OOM kill time or during a "hang-up" (what I saw
across example "hang-ups" suggests to me a livelock context,
not a deadlock context).

> interestingly enough with the patch applied i observed a smaller 
> amount of memory used for laundry as well as less swap space used until 
> right before the crash.

If your logging of values has been made public, I've not
(yet?) looked at it at all.

None of my testing reached a stage of having much swap
space in use. But the test is biased to produce the problems
quickly, rather than to explore a range of ways to reach
conditions with the problem.

I've stopped testing for now and am doing a round of OS
building and upgrading, port (re-)building and installing
and the like, mostly for aarch64 but also for armv7 and
amd64. (This is without the 'remove " + 1"' patch.)

One of the points is to see if I get any evidence of
vm.swap_enabled=0 with vm.swap_idle_enabled=0 ending up
contributing to any problems in my normal usage. So far: no.
vm.pageout_oom_seq=120 is in use for this, my normal
context since sometime in 2018.

===
Mark Millard
marklmi at yahoo.com