Re: Can not build kernel on 1GB VM

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 14 May 2022 15:02:29 UTC
	• Michael Wayne <freebsd07_at_wayne47.com> wrote on
	• Date: Sat, 14 May 2022 13:20:39 UTC :

> The machine >thinks< it's running out of swap:
> 
> May 9 13:05:57 g1 kernel: pid 9507 (ctfmerge), jid 0, uid 0, was killed: out of swap space
> May 9 13:05:58 g1 kernel: pid 4969 (make), jid 0, uid 0, was killed: out of swap space
> May 9 13:06:00 g1 kernel: pid 828 (openvpn), jid 0, uid 301, was killed: out of swap space
. . .

I've reported before the following (in a different
wording):

The wording of those messages was changes in main [so: 14],
stable/13 , and releng/13.1 recently because the historical
wording in 13.0 and before was normally a misleading
misnomer that lead people to false conclusions much of
the time.

There are now 3 distinct messages, one still being a
misnomer but 2 being accurate to the condition that
causes the OOM kill:

pid . . .(. . .), jid . . ., uid . . ., was killed: failed to reclaim memory
pid . . .(. . .), jid . . ., uid . . ., was killed: a thread waited too long to allocate a page
pid . . .(. . .), jid . . ., uid . . ., was killed: out of swap space

Unfortunately, even for the updated messaging, that last,
the out-of-space message, is not about the swap partition
content itself. It is actually for one or both of a couple
of related kernel data structures for managing the swap
space: swblk or swpctrie zone exhausted.

The way to know if out of swap might actually be involved
in the context are some other messages that do not of
themselves announce kills:

swap_pager: out of swap space
swp_pager_getswapspace(. . .): failed

If you are getting either of those 2, then you are actually
running out of swap space. Otherwise you are not.

If you are not the real reason is one of 4:

failed to reclaim memory
a thread waited too long to allocate a page
swblk zone exhausted
swpctrie zone exhausted

FYI:

kernel: swap_pager: indefinite wait buffer: bufobj: . . ., blkno: . . ., size: . . .

is for a swap read taking over 20 seconds (including
time when queued but waiting in the queue to start
the transfer).

A backlog of slow I/O for swap activity can lead to
OOM kill activity so those messages may be suggestive,
even though they do not directly report OOM kill
activity.

/boot/loader.conf can use the likes of:

# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120
#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=-1
#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults, need to explore
# alternative pairs of settings):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)

I do not know if you have tried any of these.

===
Mark Millard
marklmi at yahoo.com