Re: FYI: make's "max_jobs" needs to be separated from -j (now?)

From: <dsdqmzk_at_hotmail.com>
Date: Wed, 02 Oct 2024 14:39:32 UTC
Dan Mack wrote:
> On Wed, 2 Oct 2024, dsdqmzk@hotmail.com wrote:
> 
>> David Wolfskill wrote:
>>> I have been tracking stable/ and head (daily, with a few exceptions) for
>>> many years, now.  Over time, I set up a set of ([t]csh) aliases to
>>> simplify the exercise for me.
>>>
>>> Until yesterday, the "make -j${max_jobs} buildworld" construct had
>>> worked without issue, but (yesterday), the invocation failed quite
>>> quickly:
>>>
>>> | Tue Oct  1 11:54:18 UTC 2024
>>> | --- buildworld ---
>>> | make[1]: "/usr/src/Makefile.inc1" line 362: SYSTEM_COMPILER:
>>> Determined that CC=cc matches the source tree.  Not bootstrapping a
>>> cross-compiler.
>>> | make[1]: "/usr/src/Makefile.inc1" line 367: SYSTEM_LINKER:
>>> Determined that LD=ld matches the source tree.  Not bootstrapping a
>>> cross-linker.
>>> | --------------------------------------------------------------
>>> | >>> World build started on Tue Oct  1 11:54:18 UTC 2024
>>> | --------------------------------------------------------------
>>> | >>> Deleting stale files in build tree...
>>> |         0.14 real         0.23 user         0.10 sys
>>> | *** [_cleanworldtmp] Error code 6
>>> |
>>> | make[1]: stopped making "buildworld" in /usr/src
>>> | .ERROR_TARGET='_cleanworldtmp'
>>> | .ERROR_META_FILE=''
>>>
>>> On a bit of a whim, I tried adjusting the "max_jobs" values (downward),
>>> which didn't help, but removing the "-j14" entirely did not produce a
>>> failure.
>>>
>>> On the other hand, rebuilding clang/llvm with a single core on a laptop
>>> (when I actually want to be able to use the laptop later in the day
>>> while I'm at work) didn't seem productive.
>>>
>>> A bit more rather randomly "trying stuff" yielded the result that while
>>>
>>>     make -j14 buildworld
>>>
>>> failed (as described above),
>>>
>>>     make -j 14 buildworld
>>>
>>> carries on as before -- it's building lib/clang (and using multiple
>>> cores to do so)....  :-}
>>
>> Just got the same error, but both invocations didn't work, and I noticed
>> that bootstrapped version of mtree failed to run because of (now)
>> missing libmd.so.6.  I think it's not really related to whitespace
>> between -j and jobs number, rather you had to (re)build the bootstrap
>> tools.
> 
> I have been building current twice daily for a while and didn't notice
> this regression but I do have the space after "-j"
> 
>   #!/bin/sh
>    make -j 16 buildworld              > /logs/bw.$$ 2>&1 && \
>    make -j 8 kernel KERNCONF=GENERIC  > /logs/bk.$$ 2>&1 && \
>    sync && reboot

Do you also do `make delete-old-libs`?

> I grepped all my logs across 3 servers and did not see a single instance
> of [_cleanworldtmp] Error code ... in any of the logs.  What was the
> hash of the build you were on there, I can try to reproduce it quickly
> (but it might only trigger with your builddir state I guess)

If I understand the problem correctly, it should be as easy as:

1. build on pre-e7a629c851d system
2. install/reboot
3. make delete-old-libs
4. try to build world/kernel that fail as above, and, I think, make
kernel-toolchain was the one failing because mtree failed to run
(because of libmd.so.6 gone now)

In any case, wiping out /usr/obj solved it for me.