Re: Troubles building world on stable/13

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 25 Jan 2022 21:23:51 UTC

On 2022-Jan-25, at 12:49, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Jan-25, at 10:08, bob prohaska <fbsd@www.zefox.net> wrote:
> 
>> On Tue, Jan 25, 2022 at 09:13:08AM -0800, Mark Millard wrote:
>>> 
>>> -DBATCH ? I'm not aware of there being any use of that symbol.
>>> Do you have a documentation reference for it so that I could
>>> read about it?
>>> 
>> It's a switch to turn off dialog4ports. I can't find the reference
>> now. Perhaps it's been deprecated? A name like -DUSE_DEFAULTS would
>> be easier to understand anyway. 
> 
> I've never had buildworld buildkernel or the like try to use
> dialog4ports. I've only had port building use it. buildworld
> and buildkernel can be done with no ports installed at all.
> dialog4ports is a port.
> 
> I think -DBATCH was ignored for the activity at hand.

Actual evidence for my claim:

# grep -r "\<BATCH\>" /usr/main-src/Makefile* /usr/main-src/share/mk/ | more

# grep -r "\<BATCH\>" /usr/main-src/Makefile* /usr/ports/Mk/ | more
/usr/ports/Mk/bsd.licenses.mk:.if ${_LICENSE_STATUS} == "ask" && defined(BATCH)
/usr/ports/Mk/bsd.licenses.mk:IGNORE=		License ${_LICENSE} needs confirmation, but BATCH is defined
/usr/ports/Mk/Uses/perl5.mk:.    if defined(BATCH) && !defined(IS_INTERACTIVE)
/usr/ports/Mk/Uses/perl5.mk:.    endif # defined(BATCH) && !defined(IS_INTERACTIVE)
/usr/ports/Mk/Uses/cmake.mk:#			Default: not set, unless BATCH or PACKAGE_BUILDING is defined
/usr/ports/Mk/Uses/cmake.mk:.if defined(BATCH) || defined(PACKAGE_BUILDING)
/usr/ports/Mk/bsd.port.mk:#				  to skip this port by setting ${BATCH}, or compiling only
/usr/ports/Mk/bsd.port.mk:.if defined(BATCH)
/usr/ports/Mk/bsd.port.mk:.if defined(BATCH)
/usr/ports/Mk/bsd.port.mk:SCRIPTS_ENV+=	BATCH=yes
/usr/ports/Mk/bsd.port.mk:# If we're in BATCH mode and the port is interactive, or we're
/usr/ports/Mk/bsd.port.mk:# one might want to leave a build in BATCH mode running
/usr/ports/Mk/bsd.port.mk:.if (defined(IS_INTERACTIVE) && defined(BATCH))
/usr/ports/Mk/bsd.port.mk:	defined(PACKAGE_BUILDING) || defined(BATCH))

>> On a whim, I tried building devel/llvm13 on a Pi4 running -current with 
>> 8 GB of RAM and 8 GB of swap. To my surprise, that stopped with:
>> nemesis.zefox.com kernel log messages:
>> +FreeBSD 14.0-CURRENT #26 main-5025e85013: Sun Jan 23 17:25:31 PST 2022
>> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1873450, size: 4096
>> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 521393, size: 4096
>> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 209826, size: 12288
>> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1717218, size: 24576
>> +pid 56508 (c++), jid 0, uid 0, was killed: failed to reclaim memory
>> 
>> On an 8GB machine, that seems strange. 
> 
> -j<What?> build? -j4 ?
> 
> Were you watching the swap usage in top (or some such)?
> 
> Note: The "was killed" related notices have been improved
> in main, but there is a misnomer case about "out of swap"
> (last I checked).
> 
> An environment that gets "swap_pager: indefinite wait buffer"
> notices is problematical and the I/O delays for the virtual
> memory subsystem can lead to kills, if I understand right.
> 
> But, if I remember right, the actual message for a directly
> I/O related kill is now different.
> 
> I think that being able to reproduce this case could be
> important. I probably can not because I'd not get the
> "swap_pager: indefinite wait buffer" in my hardware
> context.
> 
>> Per the failure message I restarted the build of devel/llvm13 with 
>> make -DBATCH MAKE_JOBS_UNSAFE=YES > make.log &
> 
> Just like -DBATCH is for ports, not buildworld buildkernel,
> MAKE_JOBS_UNSAFE= is for ports, not buildworld buildkernel,
> at least if I understand right.
> 
> In other words, it probably would have been the same result
> without the two arguments.

Actual evidence for my claim for MAKE_JOBS_UNSAFE :

# grep -r MAKE_JOBS_UNSAFE /usr/main-src/Makefile* /usr/main-src/share/mk/ | more

# grep -r MAKE_JOBS_UNSAFE /usr/ports/Mk/
/usr/ports/Mk/bsd.port.mk:# MAKE_JOBS_UNSAFE
/usr/ports/Mk/bsd.port.mk:.if defined(DISABLE_MAKE_JOBS) || defined(MAKE_JOBS_UNSAFE)
/usr/ports/Mk/bsd.port.mk:BUILD_FAIL_MESSAGE+=	Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to the maintainer.
/usr/ports/Mk/bsd.gecko.mk:.if defined(DISABLE_MAKE_JOBS) || defined(MAKE_JOBS_UNSAFE)

>> It seems to be running with only one thread so far, not sure if that's
>> by design or happenstance.
>> 
>>>> However, restarting buildworld using -j1 appears to have worked past
>>>> the former point of failure.
>>> 
>>> Hmm. That usually means one (or both) of two things was involved
>>> in the failure:
>>> 
>>> A) a build race where something is not (fully) ready when
>>>  it is used
>>> 
>>> B) running out of resources, such as RAM+SWAP
>>> 
>> 
>> The stable/13 machine is short of swap; it has only 2 GB, which
>> used to be enough.
> 
> So RAM+SWAP is 1 GiByte + 2 GiByte, so 3 GiByte on that
> RPi3*? (That would have been good to know earlier, such
> as for my attempts at reproduction.)
> 
> -j<What?> for the RPi3* when it was failing?
> 
> Did you havae failures with the .cpp and .sh (so no
> make use involved) in the RAM+SWAP context?
> 
>> Maybe that's the problem, but having an error 
>> report that says it's a segfault is a confusing diagnostic. 
>> 
>>> But, as I understand, you were able to use a .cpp and
>>> .sh file pair that had been produced to repeat the
>>> problem on the RPi3B --and that would not have been a
>>> parallel-activity context.
>>> 
>> 
>> To be clear, the reproduction was on the same stable/13 that
>> reported the original failure. An attempt at reproduction
>> on a different Pi3 running -current ran without any errors.
>> Come to think of it, that machine had more swap, too.
> 
> How much swap?
> 
>>>> It's in the building libraries phase now.
>>>> Based on log size I'd guess it's about halfway through buildworld.
>>>> 
>>> 
>>> Well, hopefully you will not be stuck with -j1 builds in
>>> the future as well.
>>> 
>> Indeed!
> 
> At this point, I expect that the failure was tied to the
> RAM+SWAP totaling to 3 GiBytes.
> 
> Knowing that context we might have a reproducible report
> that can be made based on the .cpp and .sh files, where
> restricting the RAM+SWAP use allowed is part of the
> report.


===
Mark Millard
marklmi at yahoo.com