More swap trouble with armv7, was Re: -current on armv7 stuck with flashing disk light

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Tue, 04 Jul 2023 19:07:14 UTC
On Tue, Jun 27, 2023 at 10:16:57AM -0700, bob prohaska wrote:
> On Tue, Jun 27, 2023 at 09:59:40AM -0700, Mark Millard wrote:
> > > 
> > > If you want to identify system hangs, please
> > > put back:
> > > 
> > > vm.swap_enabled=0
> > > vm.swap_idle_enabled=0
> > > 
> 
> They're reinstated now, but I don't want to disturb the system
> while it seems to be building world acceptably. 
> 
Reinstating 
vm.swap_enabled=0
vm.swap_idle_enabled=0

and limiting buildworld to -j3 allows buildworld to complete successfully in 1 GB of swap.

Meanwhile, attempts to compile sysutils/usbtop using poudriere still cause swap exhaustion
while compiling /devel/llvm15 even with 2 GB of swap allocated. 

The messages are
Jul  4 11:18:48 www kernel: pid 1074 (getty), jid 0, uid 0, was killed: out of swap space
swap blk zone exhausted, increase kern.maxswzone
swblk zone ok

IIRC the "increase kern.maxswzone" is unhelpful, if not impossible. The
"swblk zone ok" seems new. 

From the gstat output near peak swap use the system wasn't I/O bound,
the disk was less than 25% busy at the time of the first OOMA kill.
Eventually it was possible to log in on the serial console and run top:

33 processes:  1 running, 29 sleeping, 3 zombie
CPU:  0.0% user,  0.0% nice, 10.6% system,  0.2% interrupt, 89.2% idle
Mem: 139M Active, 8256K Inact, 252M Laundry, 221M Wired, 98M Buf, 292M Free
Swap: 2048M Total, 1291M Used, 756M Free, 63% Inuse

  PID   JID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
40719     0 root          1  20  -20     0B  8192B swzonx   0   0:12   9.15% cron
40717     0 root          1  20  -20     0B  8192B swzonx   0   0:34   9.08% sh
40709     0 root          1  20  -20     0B  8192B swzonx   0   0:38   9.01% sshd
40720     0 root          1  20  -20     0B  8192B swzonx   3   0:13   7.47% sh
40721     0 bob           1  20    0  6608K  2600K CPU1     1   0:00   0.32% top
25761     0 bob           1  20    0    14M  6136K select   0   0:02   0.03% sshd
25852     0 root          1  20    0  4668K  1648K ttyin    1   0:01   0.03% tip
 1237     0 root          1  20    0  5820K  1540K wait     1   0:12   0.00% sh
25381     0 root          1  23    0    14M  5868K select   1   0:01   0.00% sshd
 1030     0 root          1  24    0    13M  2416K vmbckw   1   0:00   0.00% sshd
12715     0 root          1  68    0  5820K  1660K wait     0   0:00   0.00% sh
12710     0 root          1  20    0  5820K  1556K piperd   1   0:00   0.00% sh
  929     0 root          1  20    0  5356K  1256K select   3   0:00   0.00% syslogd
 1014     0 root          1  20    0  5124K  1356K nanslp   2   0:00   0.00% cron
25770     0 bob           1  36    0  6844K  3116K pause    1   0:00   0.00% tcsh
25794     0 bob           1  24    0  5380K  2188K wait     2   0:00   0.00% su
39626     0 root          1  20    0  5424K  2404K wait     2   0:00   0.00% login
40635     0 bob           1  20    0  6824K  3272K pause    1   0:00   0.00% tcsh
25820     0 root          1  21    0  5608K  2204K wait     0   0:00   0.00% sh
25851     0 root          1  20    0  4668K  1656K ttyin    3   0:00   0.00% tip
40454     0 root          1  24    0  4636K  1780K ttyin    3   0:00   0.00% getty

I'll let it go for a while to see if poudriere notices it's failed and cleans up.

At the moment /boot/loader.conf contains

# Configure USB OTG; see usb_template(4).
hw.usb.template=3
umodem_load="YES"
# Disable the beastie menu and color
beastie_disable="YES"
loader_color="NO"
vm.pageout_oom_seq="4096"
vm.pfault_oom_attempts="3"
vm.pfault_oom_attempts="120"
vm.pfault_oom_wait="20"
kern.cam.boot_delay="20000"
vfs.ffs.dotrimcons="1"
vfs.root_mount_always_wait="1"
filemon_load="YES"

/usr/local/etc/poudriere.conf contains
USE_TMPFS=no
NOHANG_TIME=28800
MAX_EXECUTION_TIME_EXTRACT=14400
MAX_EXECUTION_TIME_INSTALL=14400
MAX_EXECUTION_TIME_PACKAGE=432000
ALLOW_MAKE_JOBS=yes
MAX_JOBS_NUMBER=2
MAKE_JOBS_NUMBER=2

Do these settings look reasonable?

Thanks for writing!

bob prohaska