Re: Troubles building world on stable/13

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 25 Jan 2022 20:49:02 UTC
On 2022-Jan-25, at 10:08, bob prohaska <fbsd@www.zefox.net> wrote:

> On Tue, Jan 25, 2022 at 09:13:08AM -0800, Mark Millard wrote:
>> 
>> -DBATCH ? I'm not aware of there being any use of that symbol.
>> Do you have a documentation reference for it so that I could
>> read about it?
>> 
> It's a switch to turn off dialog4ports. I can't find the reference
> now. Perhaps it's been deprecated? A name like -DUSE_DEFAULTS would
> be easier to understand anyway. 

I've never had buildworld buildkernel or the like try to use
dialog4ports. I've only had port building use it. buildworld
and buildkernel can be done with no ports installed at all.
dialog4ports is a port.

I think -DBATCH was ignored for the activity at hand.

> On a whim, I tried building devel/llvm13 on a Pi4 running -current with 
> 8 GB of RAM and 8 GB of swap. To my surprise, that stopped with:
> nemesis.zefox.com kernel log messages:
> +FreeBSD 14.0-CURRENT #26 main-5025e85013: Sun Jan 23 17:25:31 PST 2022
> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1873450, size: 4096
> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 521393, size: 4096
> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 209826, size: 12288
> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1717218, size: 24576
> +pid 56508 (c++), jid 0, uid 0, was killed: failed to reclaim memory
> 
> On an 8GB machine, that seems strange. 

-j<What?> build? -j4 ?

Were you watching the swap usage in top (or some such)?

Note: The "was killed" related notices have been improved
in main, but there is a misnomer case about "out of swap"
(last I checked).

An environment that gets "swap_pager: indefinite wait buffer"
notices is problematical and the I/O delays for the virtual
memory subsystem can lead to kills, if I understand right.

But, if I remember right, the actual message for a directly
I/O related kill is now different.

I think that being able to reproduce this case could be
important. I probably can not because I'd not get the
"swap_pager: indefinite wait buffer" in my hardware
context.

> Per the failure message I restarted the build of devel/llvm13 with 
> make -DBATCH MAKE_JOBS_UNSAFE=YES > make.log &

Just like -DBATCH is for ports, not buildworld buildkernel,
MAKE_JOBS_UNSAFE= is for ports, not buildworld buildkernel,
at least if I understand right.

In other words, it probably would have been the same result
without the two arguments.

> It seems to be running with only one thread so far, not sure if that's
> by design or happenstance.
> 
>>> However, restarting buildworld using -j1 appears to have worked past
>>> the former point of failure.
>> 
>> Hmm. That usually means one (or both) of two things was involved
>> in the failure:
>> 
>> A) a build race where something is not (fully) ready when
>>   it is used
>> 
>> B) running out of resources, such as RAM+SWAP
>> 
> 
> The stable/13 machine is short of swap; it has only 2 GB, which
> used to be enough.

So RAM+SWAP is 1 GiByte + 2 GiByte, so 3 GiByte on that
RPi3*? (That would have been good to know earlier, such
as for my attempts at reproduction.)

-j<What?> for the RPi3* when it was failing?

Did you havae failures with the .cpp and .sh (so no
make use involved) in the RAM+SWAP context?

> Maybe that's the problem, but having an error 
> report that says it's a segfault is a confusing diagnostic. 
> 
>> But, as I understand, you were able to use a .cpp and
>> .sh file pair that had been produced to repeat the
>> problem on the RPi3B --and that would not have been a
>> parallel-activity context.
>> 
> 
> To be clear, the reproduction was on the same stable/13 that
> reported the original failure. An attempt at reproduction
> on a different Pi3 running -current ran without any errors.
> Come to think of it, that machine had more swap, too.

How much swap?

>>> It's in the building libraries phase now.
>>> Based on log size I'd guess it's about halfway through buildworld.
>>> 
>> 
>> Well, hopefully you will not be stuck with -j1 builds in
>> the future as well.
>> 
> Indeed!

At this point, I expect that the failure was tied to the
RAM+SWAP totaling to 3 GiBytes.

Knowing that context we might have a reproducible report
that can be made based on the .cpp and .sh files, where
restricting the RAM+SWAP use allowed is part of the
report.

===
Mark Millard
marklmi at yahoo.com