Re: memory issue - bug?

From: Edward Sanford Sutton, III <mirror176_at_hotmail.com>
Date: Thu, 26 Oct 2023 18:33:38 UTC
On 10/21/23 01:56, infoomatic wrote:
> Hi,
> 
> I have encountered a strange memory issue with my pcengines APU2
> (running 13.2-RELEASE-p4). I use it as a firewall with pf and it acts as
> a backup server with an external 4xdisk USB enclosure for a jail running
> borg backup. Sometimes I do cache some videos from youtube using yt-dl -
> this is where it becomes interesting. The box has 4GB of ram (and in
> loader.conf is vfs.zfs.arc_max=268435456 configured), and sometimes
> yt-dl needs too much memory, the system runs out of memory. The
> following lines pop up in syslog:

   Your issue sounds like what I had when trying to run without swap 
(main issue) and had done some other recommended tweaking to have more 
vnodes for filesystem performance (other crashes/resource exhaustion). 
Have you tried it without running sysctl customization for zfs or 
otherwise? any other tweaks being applined in loader.conf or elsewhere?

> swap_pager: cannot allocate bio

   How much swap does the system have? If none, does it still occur with 
a swap partition (I'd try at least the size of RAM). If you did have 
swap, any particular strange aspects to it (multiple partitions on 
different disks or of different size parttitions?)

   I have my desktop PC with its power button giving an orderly shutdown 
command to FreeBSD. Pressing that gives a single flicker of hard drive 
light activity and would 'almost' always shut down the system when in 
such a state (caused by RAM being used up by poudriere jobs, firefox, or 
even writing something to allocate too much memory which I used to be 
able to use to 'flush' zfs cache from it). With 32GB RAM I gave it 32GB 
swap with better results than none (system being nearly unresponsive 
once enough RAM was being used up) or too little (4GB swap would 
normally stay responsive but lead to less sane choices of crashing 
programs like xorg or nvidia graphics module when other programs were 
using up too much RAM).

> and many lines like
> pid 23342 (getty), jid 0, uid 0, was killed: failed to reclaim memory
> pid 5935 (tmux), jid 0, uid 1001, was killed: failed to reclaim memory
> pid 14258 (top), jid 0, uid 0, was killed: failed to reclaim memory
> 
> When I encounter the out of memory situation, I logged in and inspected
> the system. The system was sluggish and it reported that 0-16 MB RAM was
> used.,  I saw that only basic system services are running [1].
> Starting top tells me that "Free" memory is ~ 800MB, and constantly
> declining:
> 
> Mem: 8816K Active, 11M Inact, 3344M Wired, 548M Free
> 
> top -o res shows me top the process (18 processes: 1 running, 17
> sleeping) occupying the most memory (~3MB):
> 
> 78154 root          1  20    0    14M  3172K CPU3     3   0:00   0.17% top
> 
> I can just watch the systems wired memory going up until everything is
> dying. Disabling and re-enabling swap has no effect. Top is killed after
> about 2 minutes, and I am kicked out. I can login again shortly after,
> but the same situation happens again: Free memory going down and Wired
> going up. PF continues to work but I cannot start a backup or any file
> operation because they just die after a short while.
> 
> Has someone an idea to debug this ... or a workaround?
> 
> Thanks for sharing your ideas!
> 
> Robert
> 
> 
> 
> [1]ps auxwww:
> USER     PID  %CPU %MEM   VSZ  RSS TT  STAT STARTED        TIME COMMAND
> root      11 198.5  0.0     0   64  -  RNL  Sun18   19955:10.62 [idle]
> root       0 101.2  0.1     0 2832  -  DLs  Sun18    7354:27.71 [kernel]
> root       5 100.0  0.1     0 2080  -  RL   Sun18    1307:50.45 [zfskern]
> root   44547   2.1  0.1 14908 2800  0  S    10:41       0:00.11 bash
> root       7   1.0  0.0     0   48  -  DL   Sun18       3:47.38 
> [pagedaemon]
> robert 42128   0.9  0.1 21508 2112  -  S    10:41       0:00.06 sshd:
> robert@pts/0 (sshd)
> root   41079   0.7  0.0 21508 1584  -  Ss   10:41       0:00.26 sshd:
> robert [priv] (sshd)
> root       1   0.0  0.0 11768   36  -  SLs  Sun18       0:02.07 /sbin/init
> root       2   0.0  0.0     0   64  -  DL   Sun18       0:00.00 [KTLS]
> root       3   0.0  0.0     0   80  -  DL   Sun18       0:00.00 [crypto]
> root       4   0.0  0.0     0   32  -  DL   Sun18      86:34.52 [cam]
> root       6   0.0  0.0     0   16  -  DL   Sun18       1:28.84
> [rand_harvestq]
> root       8   0.0  0.0     0   16  -  DL   Sun18       1:47.46 [vmdaemon]
> root       9   0.0  0.0     0   48  -  DL   Sun18       0:19.60 [bufdaemon]
> root      10   0.0  0.0     0   16  -  DL   Sun18       0:00.00 [audit]
> root      12   0.0  0.0     0  240  -  WL   Sun18      98:16.01 [intr]
> root      13   0.0  0.0     0   48  -  DL   Sun18       0:00.92 [geom]
> root      14   0.0  0.0     0   16  -  DL   Sun18       0:00.00
> [sequencer 00]
> root      15   0.0  0.0     0  160  -  DL   Sun18     126:23.98 [usb]
> root      16   0.0  0.0     0   16  -  DL   Sun18       0:12.04
> [acpi_thermal]
> root      17   0.0  0.0     0   16  -  DL   Sun18       0:07.64 [vnlru]
> root      18   0.0  0.0     0   16  -  DL   Sun18       0:21.84 [syncer]
> root      19   0.0  0.0     0   16  -  DL   Sun18       7:37.35 [racctd]
> root    1265   0.0  0.0     0   16  -  DL   Sun18     156:54.73
> [g_eli[0] gpt/hd001]
> root    1537   0.0  0.0     0   16  -  DL   Sun18     158:36.95
> [g_eli[1] gpt/hd001]
> root    1934   0.0  0.0     0   16  -  DL   Sun18     161:37.98
> [g_eli[2] gpt/hd001]
> root    2196   0.0  0.0     0   16  -  DL   Sun18     159:34.59
> [g_eli[3] gpt/hd001]
> root    3057   0.0  0.0     0   16  -  DL   Sun18     157:32.36
> [g_eli[0] gpt/hd002]
> root    3334   0.0  0.0     0   16  -  DL   Sun18     159:15.82
> [g_eli[1] gpt/hd002]
> root    3574   0.0  0.0     0   16  -  DL   Sun18       0:00.25 [aiod1]
> root    3751   0.0  0.0     0   16  -  DL   Sun18     161:45.95
> [g_eli[2] gpt/hd002]
> root    3889   0.0  0.0     0   16  -  DL   Sun18       0:00.24 [aiod2]
> root    4138   0.0  0.0     0   16  -  DL   Sun18       0:00.25 [aiod3]
> root    4252   0.0  0.0     0   16  -  DL   Sun18     160:35.16
> [g_eli[3] gpt/hd002]
> root    4397   0.0  0.0     0   16  -  DL   Sun18       0:00.24 [aiod4]
> root    5008   0.0  0.0     0   16  -  DL   Sun18     156:54.76
> [g_eli[0] gpt/hd011]
> root    5009   0.0  0.0     0   16  -  DL   Sun18     158:16.99
> [g_eli[1] gpt/hd011]
> root    5457   0.0  0.0     0   16  -  DL   Sun18     161:03.34
> [g_eli[2] gpt/hd011]
> root    5929   0.0  0.0     0   16  -  DL   Sun18     159:37.17
> [g_eli[3] gpt/hd011]
> root    9001   0.0  0.0 21068  328  -  Ss   Sun18       0:00.25 sshd:
> /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
> root   10008   0.0  0.0 12908    4  -  IWs  -           0:00.00
> /usr/sbin/cron -s
> root   43429   0.0  0.0     0   16  -  DL   Sun18      11:45.45 [pf purge]
> root   78988   0.0  0.0 12868  300  -  Ss   Sun18       0:17.57
> /usr/sbin/syslogd -ss
> root   84156   0.0  0.0     0   16  -  DL   Sun18      35:19.32
> [g_eli[0] gpt/hd037]
> root   84470   0.0  0.0     0   16  -  DL   Sun18      35:53.31
> [g_eli[1] gpt/hd037]
> root   85309   0.0  0.0     0   16  -  DL   Sun18      38:08.91
> [g_eli[2] gpt/hd037]
> root   86045   0.0  0.0     0   16  -  DL   Sun18      37:08.59
> [g_eli[3] gpt/hd037]
> root   86051   0.0  0.0     0   16  -  DL   Sun18      35:51.55
> [g_eli[0] gpt/hd038]
> root   86644   0.0  0.0     0   16  -  DL   Sun18      36:03.03
> [g_eli[1] gpt/hd038]
> root   87199   0.0  0.0     0   16  -  DL   Sun18      37:56.77
> [g_eli[2] gpt/hd038]
> root   87235   0.0  0.0     0   16  -  DL   Sun18      37:21.79
> [g_eli[3] gpt/hd038]
> root   87490   0.0  0.0     0   16  -  DL   Sun18      35:39.81
> [g_eli[0] gpt/hd039]
> root   87546   0.0  0.0     0   16  -  DL   Sun18      35:43.55
> [g_eli[1] gpt/hd039]
> root   87988   0.0  0.0     0   16  -  DL   Sun18      38:05.54
> [g_eli[2] gpt/hd039]
> root   88507   0.0  0.0     0   16  -  DL   Sun18      37:06.09
> [g_eli[3] gpt/hd039]
> root   88536   0.0  0.0     0   16  -  DL   Sun18      35:28.86
> [g_eli[0] gpt/hd040]
> root   89143   0.0  0.0     0   16  -  DL   Sun18      37:09.63
> [g_eli[1] gpt/hd040]
> root   89723   0.0  0.0     0   16  -  DL   Sun18      37:56.60
> [g_eli[2] gpt/hd040]
> root   90441   0.0  0.0     0   16  -  DL   Sun18      37:08.54
> [g_eli[3] gpt/hd040]
> root   97972   0.0  0.0 12836   36 u0  Is+  10:22       0:00.01
> /usr/libexec/getty 3wire ttyu0
> root   34488   0.0  0.0 12836   36 v0  Is+  10:39       0:00.01
> /usr/libexec/getty Pc ttyv0
> root   97710   0.0  0.0 12836   36 v1  Is+  10:22       0:00.01
> /usr/libexec/getty Pc ttyv1
> root   33736   0.0  0.0 12836   36 v2  Is+  10:38       0:00.02
> /usr/libexec/getty Pc ttyv2
> root      22   0.0  0.0 12836   36 v3  Is+  10:25       0:00.02
> /usr/libexec/getty Pc ttyv3
> root   34562   0.0  0.0 12836   36 v4  Is+  10:39       0:00.01
> /usr/libexec/getty Pc ttyv4
> root   33901   0.0  0.0 12836   36 v5  Is+  10:39       0:00.01
> /usr/libexec/getty Pc ttyv5
> root      20   0.0  0.0 12836   36 v6  Is+  06:06       0:00.01
> /usr/libexec/getty Pc ttyv6
> root      21   0.0  0.0 12836   36 v7  Is+  07:12       0:00.01
> /usr/libexec/getty Pc ttyv7
> robert 42985   0.0  0.0 13580    4  0  SWs  -           0:00.00 -sh (sh)
> robert 43832   0.0  0.0 14908    4  0  SW   -           0:00.00
> /usr/local/bin/bash
> root   45136   0.0  0.1 13444 2160  0  R+   10:41       0:00.02 ps auxwww
>