Re: FYI: make's "max_jobs" needs to be separated from -j (now?)

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 02 Oct 2024 14:54:52 UTC
On Wed, Oct 2, 2024 at 8:42 AM Dan Mack <mack@macktronics.com> wrote:

> On Wed, 2 Oct 2024, dsdqmzk@hotmail.com wrote:
>
> > Dan Mack wrote:
> >> On Wed, 2 Oct 2024, dsdqmzk@hotmail.com wrote:
> >>
> >>> David Wolfskill wrote:
> >>>> I have been tracking stable/ and head (daily, with a few exceptions)
> for
> >>>> many years, now.  Over time, I set up a set of ([t]csh) aliases to
> >>>> simplify the exercise for me.
> >>>>
> >>>> Until yesterday, the "make -j${max_jobs} buildworld" construct had
> >>>> worked without issue, but (yesterday), the invocation failed quite
> >>>> quickly:
> >>>>
> >>>> | Tue Oct  1 11:54:18 UTC 2024
> >>>> | --- buildworld ---
> >>>> | make[1]: "/usr/src/Makefile.inc1" line 362: SYSTEM_COMPILER:
> >>>> Determined that CC=cc matches the source tree.  Not bootstrapping a
> >>>> cross-compiler.
> >>>> | make[1]: "/usr/src/Makefile.inc1" line 367: SYSTEM_LINKER:
> >>>> Determined that LD=ld matches the source tree.  Not bootstrapping a
> >>>> cross-linker.
> >>>> | --------------------------------------------------------------
> >>>> | >>> World build started on Tue Oct  1 11:54:18 UTC 2024
> >>>> | --------------------------------------------------------------
> >>>> | >>> Deleting stale files in build tree...
> >>>> |         0.14 real         0.23 user         0.10 sys
> >>>> | *** [_cleanworldtmp] Error code 6
> >>>> |
> >>>> | make[1]: stopped making "buildworld" in /usr/src
> >>>> | .ERROR_TARGET='_cleanworldtmp'
> >>>> | .ERROR_META_FILE=''
> >>>>
> >>>> On a bit of a whim, I tried adjusting the "max_jobs" values
> (downward),
> >>>> which didn't help, but removing the "-j14" entirely did not produce a
> >>>> failure.
> >>>>
> >>>> On the other hand, rebuilding clang/llvm with a single core on a
> laptop
> >>>> (when I actually want to be able to use the laptop later in the day
> >>>> while I'm at work) didn't seem productive.
> >>>>
> >>>> A bit more rather randomly "trying stuff" yielded the result that
> while
> >>>>
> >>>>     make -j14 buildworld
> >>>>
> >>>> failed (as described above),
> >>>>
> >>>>     make -j 14 buildworld
> >>>>
> >>>> carries on as before -- it's building lib/clang (and using multiple
> >>>> cores to do so)....  :-}
> >>>
> >>> Just got the same error, but both invocations didn't work, and I
> noticed
> >>> that bootstrapped version of mtree failed to run because of (now)
> >>> missing libmd.so.6.  I think it's not really related to whitespace
> >>> between -j and jobs number, rather you had to (re)build the bootstrap
> >>> tools.
> >>
> >> I have been building current twice daily for a while and didn't notice
> >> this regression but I do have the space after "-j"
> >>
> >>   #!/bin/sh
> >>    make -j 16 buildworld              > /logs/bw.$$ 2>&1 && \
> >>    make -j 8 kernel KERNCONF=GENERIC  > /logs/bk.$$ 2>&1 && \
> >>    sync && reboot
> >
> > Do you also do `make delete-old-libs`?
> >
> >> I grepped all my logs across 3 servers and did not see a single instance
> >> of [_cleanworldtmp] Error code ... in any of the logs.  What was the
> >> hash of the build you were on there, I can try to reproduce it quickly
> >> (but it might only trigger with your builddir state I guess)
> >
> > If I understand the problem correctly, it should be as easy as:
> >
> > 1. build on pre-e7a629c851d system
> > 2. install/reboot
> > 3. make delete-old-libs
> > 4. try to build world/kernel that fail as above, and, I think, make
> > kernel-toolchain was the one failing because mtree failed to run
> > (because of libmd.so.6 gone now)
> >
> > In any case, wiping out /usr/obj solved it for me.
>
> Ack, okay.   I can't trigger it with a fresh or my /usr/obj but in any
> event the error number 6 is probably referring to a path or directory
> missing while doing a parallel build given some input state :-)
>
> #define ENXIO           6               /* Device not configured */
>

ENXIO usually is reserved for hardware errors when a device disappears
for block I/O contexts. So I'm not sure that this theory is so good.

But shell error exit statuses are largely independent of errnos.

Warner