Re: devel/llvm13 failed to reclaim memory on 8 GB Pi4 running -current

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 27 Jan 2022 19:31:17 UTC
On 2022-Jan-27, at 08:45, bob prohaska <fbsd@www.zefox.net> wrote:

> Attempts to compile devel/llvm13 on a Pi4 running -current (updated
> on 20220126) with 8 GB of RAM and 8 GB of swap has failed on two occasions using 
> make -DBATCH > make.log & 
> in /usr/ports/devel/llvm13 using the system compiler. The system is
> self-hosted. 
> 
> The first failure reported clang error 139, but the second
> was different, reporting only:
> FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/check-expression.cpp.o
> along with a console report of
> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1258432, size: 4096
> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 627221, size: 8192
> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 240419, size: 4096
> +swap_pager: out of swap space

In recent builds, such as yours, the above "out of swap" is a
misnomer but is very interesting for what it is actually about.

Mark Johnston later wrote on 2022-Jan-15 about his "git:
4a864f624a70 - main - vm_pageout: Print a more accurate message
to the console before an OOM kill" that produced the above report
of "out of swap space":

QUOTE
Hmm, those cases should likely be changed from "out of swap space" to
"failed to allocate swap metadata" or something like that.
END QUOTE

Your context proves the metadata problem really happens, so
the messaging should be fixed to not be misleading.

In my builds I've code that is more explicit:

diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
index 01cf9233329f..280621ca51be 100644
--- a/sys/vm/swap_pager.c
+++ b/sys/vm/swap_pager.c
@@ -2091,6 +2091,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t pindex, daddr_t swapblk)
                                   0, 1))
                                       printf("swap blk zone exhausted, "
                                           "increase kern.maxswzone\n");
+                               printf("swp_pager_meta_build: swap blk uma zone exhausted\n");
                               vm_pageout_oom(VM_OOM_SWAPZ);
                               pause("swzonxb", 10);
                       } else
@@ -2121,6 +2122,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t pindex, daddr_t swapblk)
                                   0, 1))
                                       printf("swap pctrie zone exhausted, "
                                           "increase kern.maxswzone\n");
+                               printf("swp_pager_meta_build: swap pctrie uma zone exhausted\n");
                               vm_pageout_oom(VM_OOM_SWAPZ);
                               pause("swzonxp", 10);
                       } else

The "metadata" is the "swap blk uma zone" and "swap pctrie
uma zone". Unfortuantely, which got the failure is not still
indicated in the standard builds.

> +swp_pager_getswapspace(12): failed
> +pid 61012 (c++), jid 0, uid 0, was killed: failed to reclaim memory

Abssent being able to swap, it tries to reclaim --and that
too failed. That finally leads to the kills.

> Swap use peaked a little over 50%.

So at around 50% "swap blk uma zone" and/or "swap pctrie uma zone"
had problems, probably fragmentation related problems.

> After the first failure a restart
> of make using MAKE_JOBS_UNSAFE=yes ran to completion with one thread.
> 
> A copy of the build log, logging script and other notes is at
> http://www.zefox.net/~fbsd/rpi4/20220127/
> 
> Clang error 139 has been seen several times during make buildworld on a Pi3 running
> stable/13 with 2 GB of swap as well. Perhaps the two failures are related. The Pi3 
> failures didn't report out of swap, all were clang error 139 with "failed to reclaim 
> memory". Even with only 1 thread (j1) the failure reproduced.
> 

Note in your report above: obj.FortranEvaluate.dir

If you use the options to disable building flang (a.k.a.,
the Fortran compiler build), your builds on the RPi4B
will likely work in the current configuration.

But it looks like you have identified a test context
for the "swap blk uma zone" and "swap pctrie uma zone"
handling.

===
Mark Millard
marklmi at yahoo.com