Belated out of swap kill on rpi3 at r359216

Tue Mar 24 18:55:11 UTC 2020

On Tue, Mar 24, 2020 at 10:09:28AM -0700, Mark Millard wrote:
> 
> 
> On 2020-Mar-24, at 08:57, bob prohaska <fbsd at www.zefox.net> wrote:
> 
> > An attempt to buildworld on an rpi3 running r359216 stopped with an
> > OOMA kill. Sources were at 359264, loader.conf contained
> > vm.pageout_oom_seq="4096" . 
> 
> What of the value of vm.pfault_oom_attempts ?
> 
> If vm.pfault_oom_attempts was not -1,
> what of the value of vm.pfault_oom_wait ?
> 
bob at www:/usr/src % sysctl vm.pfault_oom_wait
vm.pfault_oom_wait: 10
I didn't change it, this must be the default. 
Would it be useful to add something like
vm.pfault_oom_wait="20" 
to /boot/loader.conf?

> > A snipped of gstat log suggests the worst congestion in the storage I/O
> > happened at Tue Mar 24 04:52:26 PDT 2020 with an L(q) of 37, but the
> > OOMA kill happened at Tue Mar 24 04:53:04 PDT 2020, by which time the
> > L(q) had dropped to one, half a minute later.
> > 
> > Is the delay in OOMA action to be expected? 
> > 
> > Here's the relevant part of the log, I hope the columns display readably:
> > 
> > 0/2/2/19177 mbuf clusters in use (current/cache/total/max)
> > procs     memory       page                      disks     faults       cpu
> > r b w     avm     fre  flt  re  pi  po    fr   sr mm0 da0   in   sy   cs us sy id
> > 0  0 29 1897320   47312 12851   9   4   5 13330 1669   0   0 14172  8020  3034 65 29  6
> > dT: 1.056s  w: 1.000s
> > L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
> >   37    367    323   5463    6.4     45   1511   76.8      0      0    0.0   91.7  mmcsd0
> >   37    367    323   5463    6.5     45   1511   76.9      0      0    0.0   93.3  mmcsd0s2
> >   34    133    111   3209    7.6     22    697  134.7      0      0    0.0   74.0  mmcsd0s2a
> >    3    235    212   2254    5.9     23    814   21.5      0      0    0.0   70.0  mmcsd0s2b
> >   34    133    111   3209    7.6     22    697  134.7      0      0    0.0   74.2  ufs/rootfs
> > Tue Mar 24 04:52:26 PDT 2020
> > Device          1K-blocks     Used    Avail Capacity
> > /dev/mmcsd0s2b    4404252   224484  4179768     5%
> > Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288
> > Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
> 
> How bad were things back when the "indefinate wait buffer" notices
> were generated (Mar 24 04:20:50 time frame)?
> 
It looks like _new_ indefinite wait messages started coming at around Mon Mar 23 23:00:28 PDT 2020:
procs     memory       page                      disks     faults       cpu
r b w     avm     fre  flt  re  pi  po    fr   sr mm0 da0   in   sy   cs us sy id
 2  0 18 1588824  197676 14108   2   1   0 14759  238   0   0 14055  9263  2668 62 32  5
dT: 1.027s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
    9      1      0      0    0.0      1     31  13451      0      0    0.0  326.9  mmcsd0
    9      1      0      0    0.0      1     31  13451      0      0    0.0  326.9  mmcsd0s2
    6      1      0      0    0.0      1     31  13451      0      0    0.0  326.9  mmcsd0s2a
    6      1      0      0    0.0      1     31  13451      0      0    0.0  326.9  ufs/rootfs
Mon Mar 23 23:00:28 PDT 2020
Device          1K-blocks     Used    Avail Capacity
/dev/mmcsd0s2b    4404252   245288  4158964     6%
Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 280324, size: 8192
Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 285903, size: 16384

Admittedly, 300% busy looks pretty bad. Still the machine didn't quit.....
On the next sample %busy went down to 18% for swap, less for all else.

> 
> Below I show code changes to be more explicit in the
> output messaging about what contributes to initiating
> OOM kills, without needing boot verbose or the like.
> There are also some messages from Mark J.'s old code
> for reporting on things related to initiating OOM
> kills, or some minor variations of them.
>

> You may want to try such changes to provide more
> context for your OOM kills when they happen. Below
> the 4 reasons reported are (showing the most
> detailed of the related messages from different
> stages):
> 
> swp_pager_meta_build: swap blk uma zone exhausted
> 
> swp_pager_meta_build: swap pctrie uma zone exhausted
> 
> vm_fault_allocate: proc %d (%s) failed to alloc page on fault, starting OOM
> 
> m_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u
> 
> 

Could inquiries instead  be added to the logging script?
Right now it's
#!/bin/sh
while true
do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 /var/log/messages ; netstat -m | grep "mbuf clusters"
done

I'd much rather tamper with the logging script than the kernel 8-)

Many thanks for all your help!

bob prohaska