main aarch64: poudriere-devel [UFS context] cpdup stuck in pgnslp state

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 22 Mar 2024 00:34:30 UTC
Note, more recent process creations towards top, older ones towards bottom:

  PID   JID USERNAME    PRI NICE     SIZE       RES STATE    C   TIME     CPU COMMAND
. . .
33693    19 root         68    0   6524Ki    3252Ki wait     3   0:00   0.00% /usr/bin/make -C /usr/ports/lang/gcc13 build
33692     0 root         68    0  15728Ki    3552Ki wait     0   0:00   0.00% sh: poudriere[main-CA7-default][02]: build_pkg (gcc13-13.2.0_4) (sh)
30174     0 root         68    0  15728Ki    3564Ki select   3   0:00   0.00% sh: poudriere[main-CA7-default][02]: build_pkg (gcc13-13.2.0_4) (sh)
26338     0 root         66    0  17740Ki    5044Ki pgnslp   0   0:01   0.00% cpdup -i0 -s0 -f -x ref 01
26308     0 root         68    0  15728Ki    3556Ki wait     0   0:00   0.00% sh: poudriere[main-CA7-default][01]: build_pkg (boost-libs-1.84.0) (sh)
33592     0 root         26    0  15728Ki    3388Ki piperd   2   0:01   0.00% sh: poudriere[main-CA7-default]: pkg_cacher_main (sh)
29205     0 root         68    0  15728Ki    3392Ki nanslp   2   1:52   0.14% sh: poudriere[main-CA7-default]: html_json_main (sh)
28834     0 root         20    0  15728Ki    3548Ki select   3   0:01   0.00% /usr/local/libexec/poudriere/sh -e /usr/local/share/poudriere/bulk.sh -jmain-CA7 -c -f /root/origins/CA7-origins.txt
28833     0 root         20    0  13560Ki    1924Ki wait     3   0:00   0.00% /bin/sh /root/build-ports-main-CA7.sh -c
. . .

pgnslp seems to be from: vm_page_acquire_unlocked in sys/vm/vm_page.c .
That in turn looks to be using vm_page_grab_sleep :

                if (!vm_page_grab_sleep(object, m, pindex, "pgnslp",
                    allocflags, false))
                        return (false);

and:

/*
 *      vm_page_grab_sleep
 *
 *      Sleep for busy according to VM_ALLOC_ parameters.  Returns true
 *      if the caller should retry and false otherwise.
 *
 *      If the object is locked on entry the object will be unlocked with
 *      false returns and still locked but possibly having been dropped
 *      with true returns.
 */
static bool
vm_page_grab_sleep(vm_object_t object, vm_page_t m, vm_pindex_t pindex,
    const char *wmesg, int allocflags, bool locked)
{
                                        
        if ((allocflags & VM_ALLOC_NOWAIT) != 0)
                return (false);
                 
        /*
         * Reference the page before unlocking and sleeping so that
         * the page daemon is less likely to reclaim it.
         */
        if (locked && (allocflags & VM_ALLOC_NOCREAT) == 0)
                vm_page_reference(m);
                
        if (_vm_page_busy_sleep(object, m, pindex, wmesg, allocflags, locked) &&
            locked)
                VM_OBJECT_WLOCK(object);
        if ((allocflags & VM_ALLOC_WAITFAIL) != 0)
                return (false);

        return (true);
}

. . .
[10:08:06] [01] [00:00:00] Building devel/boost-libs | boost-libs-1.84.0
. . .

# poudriere status -b
[main-CA7-default] [2024-03-21_06h23m31s] [parallel_build] Queued: 265 Built: 213 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 52   Time: 10:50:40
 ID  TOTAL                              ORIGIN   PKGNAME                            PHASE TIME     TMPFS      CPU% MEM%
[01] 00:42:40                 devel/boost-libs | boost-libs-1.84.0               starting 00:42:40 951.54 MiB          
. . .

Unfortunately:

A) The booted kernel is my personal build based on -mcpu=cortex-a76
   and LSE_ATOMICS . (It is in use on a RPi5 booted via EDK2.)

B) The booted world is a PkgBase world.

C) The poudriere jail's world directory tree is my personal armv7
   world build based on -mcpu=cortex-a7 .

All are based on: main-n268827-75464941dc17 . (Well, PkgBase
commit identification/verification for world does not exist.
I happened to update PkgBase during a long lull for commits
to main. In the context, the boot-world seems unlikely to be
involved here.)

The boot media is a U2 Optane 960 GB used via a USB3 adaptor.

I've done bunches of builds in the (A)-(C) context on the RPi5
and have not seen this before, so: does not look to be readily
repeatable.

(Unfortunately, the purpose of the build was to find out how long
the particular build configuration took to finish building the
265 packages from scratch, for comparison to other builds.)

I may wait for the system to become fairly idle and then see about
forcing a crash dump. It may be a while before the poudriere bulk
runs out of packages it can build, absent building boost-libs .


Side note:
As far as I can tell, how to identify a context that allows
identification of what commit vintage a PkgBase world is based on
is unspecified so far. For a PkgBase kernel uname -apKU may well
report the kernel-commit identification well. (Hard to verify.)

===
Mark Millard
marklmi at yahoo.com