Swap filling up, suspect kernel memory issue?

Reply: Graham Perrin : "Re: Swap filling up, suspect kernel memory issue?"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Scott Gasch <scott.gasch_at_gmail.com>
Date: Sun, 11 Jun 2023 15:54:21 UTC
I am running a 13.2-RELEASE GENERIC kernel and seeing a pattern where,
after about 10 days of uptime, my swap begins to fill up.

# *swapinfo -h*
Device              Size     Used    Avail Capacity
/dev/ada0p3          48G     4.3G      44G     9%
/dev/ada1p3          48G     4.3G      44G     9%
/dev/ada2p3          48G     4.3G      44G     9%
Total               144G      13G     131G     9%

# *vmstat -h*
 procs    memory    page                      disks     faults       cpu
 r  b  w  avm  fre  flt  re  pi  po   fr   sr ad0 ad1   in   sy   cs us sy
id
 1  0 45 598G  17G  19K   2   0   0  20K 2.3K   0   0 1206  48K  19K  3  1
96

I cannot find a usermode culprit; the sum of the size of process swaps is
nowhere near the amount of space reported by swapinfo:

# */usr/bin/top -w -o swap*
last pid: 88600;  load averages:  1.15,  1.00,  0.88                  up
11+11:21:55  08:41:54
352 processes: 3 running, 347 sleeping, 2 zombie
CPU:  5.9% user,  0.0% nice,  1.1% system,  0.0% interrupt, 92.9% idle
Mem: 7812M Active, 13G Inact, 65G Laundry, 22G Wired, 744M Buf, 17G Free
ARC: 9571M Total, 1507M MFU, 5037M MRU, 27M Anon, 92M Header, 2907M Other
     4620M Compressed, 12G Uncompressed, 2.57:1 Ratio
Swap: 144G Total, 13G Used, 131G Free, 9% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME    WCPU
COMMAND
 2097 jupyter       1  20    0   118M    10M  12M kqread  17   0:09   0.00%
python3.9
85444    770        1  20    0   179M    52K  11M kqread  21   0:00   0.00%
postgres
85441    770        1  20    0   179M    52K  11M kqread  12   0:00   0.00%
postgres
85439    770        1  20    0   179M    52K  11M kqread  19   0:00   0.00%
postgres
 9886 www           1  52    0   367M    52K  10M accept  10   0:00   0.00%
php-fpm
 9887 www           1  52    0   367M    52K  10M accept   0   0:00   0.00%
php-fpm
 9885 www           1  52    0   367M    52K  10M accept   8   0:00   0.00%
php-fpm
 9883 www           1  52    0   367M    52K  10M accept  12   0:00   0.00%
php-fpm
 9881 www           1  52    0   367M    52K  10M accept   0   0:00   0.00%
php-fpm
 9880 www           1  52    0   367M    52K  10M accept   3   0:00   0.00%
php-fpm
 9882 www           1  52    0   367M    52K  10M accept   3   0:00   0.00%
php-fpm
 9876 www           1  52    0   367M    52K  10M accept  22   0:00   0.00%
php-fpm
 9875 www           1  52    0   367M    52K  10M accept   5   0:00   0.00%
php-fpm
 9878 www           1  52    0   367M    52K  10M accept   7   0:00   0.00%
php-fpm
 9874 www           1  52    0   367M    52K  10M accept   1   0:00   0.00%
php-fpm
 9872 www           1  52    0   367M    52K  10M accept   1   0:00   0.00%
php-fpm
 9873 www           1  52    0   367M    52K  10M accept   3   0:00   0.00%
php-fpm
 9871 www           1  52    0   367M    52K  10M accept  13   0:00   0.00%
php-fpm
 9870 www           1  52    0   367M    52K  10M accept   2   0:00   0.00%
php-fpm
57411    770        1  20    0   179M   696K 9108K kqread  10   0:01
0.00% postgres
54978 www           1  20    0    32M   852K 8336K accept  17   0:08
0.02% httpd
 9639    770        1  20    0   179M   572K 8060K kqread   6   0:02
0.00% postgres
 8350    770        1  20    0   176M   560K 7744K kqread  18   0:02
0.00% postgres
 8349    770        1  20    0   176M   668K 7640K kqread   5   0:00
0.00% postgres
 8354    770        1  20    0   177M  3248K 5280K kqread   5   0:00
0.00% postgres
 8353    770        1  20    0   177M  3452K 5172K kqread   2   0:03
0.00% postgres
 5949 smmsp         1  20    0    18M   964K 3324K pause   17   0:00
0.00% sendmail
 1956 root          1  52    0    18M  4096B 3024K lockf   14   0:00
0.00% <saslauthd>
 5984 scott         1  20    0    15M  4096B 3020K wait    20   0:00
0.00% <bash>
 1968 root          1  52    0    18M  4096B 2976K accept  12   0:00
0.00% <saslauthd>

There are not "that many" processes so this is not "death by a thousand
cuts":

# *ps -aux | wc -l*
     372

I have a suspicion that this is related to the wireguard kmod simply
because I run wireguard in a vnet jail and didn't observe this problem
until setting that up.  But I don't have any evidence.

I've tried to mitigate this via swapoff -a.  This works once but the next
day swap will be back, even fuller.  I've been doing regular reboots to fix
this but would like to get to the bottom of it.  If left alone, swap will
fill up and the machine will get into a "not quite hung" but unusable and
useless state.

Am I off base with my suspicion that this is kernel mode memory?  Can
someone teach me how to diagnose the status of kernel mode memory heap?

Thx,
Scott