Re: devel/llvm13 failed to reclaim memory on 8 GB Pi4 running -current

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 27 Jan 2022 20:12:20 UTC

On 2022-Jan-27, at 11:31, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Jan-27, at 08:45, bob prohaska <fbsd@www.zefox.net> wrote:
> 
>> Attempts to compile devel/llvm13 on a Pi4 running -current (updated
>> on 20220126) with 8 GB of RAM and 8 GB of swap has failed on two occasions using 
>> make -DBATCH > make.log & 
>> in /usr/ports/devel/llvm13 using the system compiler. The system is
>> self-hosted. 

Context question: ZFS? UFS?

(In things involving memory usage issues, knowing which is
always appropriate because of differences in memory use
patterns.)

>> The first failure reported clang error 139, but the second
>> was different, reporting only:
>> FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/check-expression.cpp.o
>> along with a console report of
>> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1258432, size: 4096
>> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 627221, size: 8192
>> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 240419, size: 4096
>> +swap_pager: out of swap space
> 
> In recent builds, such as yours, the above "out of swap" is a
> misnomer but is very interesting for what it is actually about.
> 
> Mark Johnston later wrote on 2022-Jan-15 about his "git:
> 4a864f624a70 - main - vm_pageout: Print a more accurate message
> to the console before an OOM kill" that produced the above report
> of "out of swap space":
> 
> QUOTE
> Hmm, those cases should likely be changed from "out of swap space" to
> "failed to allocate swap metadata" or something like that.
> END QUOTE
> 
> Your context proves the metadata problem really happens, so
> the messaging should be fixed to not be misleading.
> 
> In my builds I've code that is more explicit:
> 
> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
> index 01cf9233329f..280621ca51be 100644
> --- a/sys/vm/swap_pager.c
> +++ b/sys/vm/swap_pager.c
> @@ -2091,6 +2091,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t pindex, daddr_t swapblk)
>                                   0, 1))
>                                       printf("swap blk zone exhausted, "
>                                           "increase kern.maxswzone\n");
> +                               printf("swp_pager_meta_build: swap blk uma zone exhausted\n");
>                               vm_pageout_oom(VM_OOM_SWAPZ);
>                               pause("swzonxb", 10);
>                       } else
> @@ -2121,6 +2122,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t pindex, daddr_t swapblk)
>                                   0, 1))
>                                       printf("swap pctrie zone exhausted, "
>                                           "increase kern.maxswzone\n");
> +                               printf("swp_pager_meta_build: swap pctrie uma zone exhausted\n");
>                               vm_pageout_oom(VM_OOM_SWAPZ);
>                               pause("swzonxp", 10);
>                       } else
> 
> The "metadata" is the "swap blk uma zone" and "swap pctrie
> uma zone". Unfortuantely, which got the failure is not still
> indicated in the standard builds.
> 
>> +swp_pager_getswapspace(12): failed
>> +pid 61012 (c++), jid 0, uid 0, was killed: failed to reclaim memory
> 
> Abssent being able to swap, it tries to reclaim --and that
> too failed. That finally leads to the kills.
> 
>> Swap use peaked a little over 50%.
> 
> So at around 50% "swap blk uma zone" and/or "swap pctrie uma zone"
> had problems, probably fragmentation related problems.
> 
>> After the first failure a restart
>> of make using MAKE_JOBS_UNSAFE=yes ran to completion with one thread.
>> 
>> A copy of the build log, logging script and other notes is at
>> http://www.zefox.net/~fbsd/rpi4/20220127/
>> 
>> Clang error 139 has been seen several times during make buildworld on a Pi3 running
>> stable/13 with 2 GB of swap as well. Perhaps the two failures are related. The Pi3 
>> failures didn't report out of swap, all were clang error 139 with "failed to reclaim 
>> memory". Even with only 1 thread (j1) the failure reproduced.
>> 
> 
> Note in your report above: obj.FortranEvaluate.dir
> 
> If you use the options to disable building flang (a.k.a.,
> the Fortran compiler build), your builds on the RPi4B
> will likely work in the current configuration.
> 
> But it looks like you have identified a test context
> for the "swap blk uma zone" and "swap pctrie uma zone"
> handling.


===
Mark Millard
marklmi at yahoo.com