RE: Why is the process gets killed because "a thread waited too long to allocate a page"?

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 10 Oct 2024 02:21:02 UTC
Yuri <yuri_at_FreeBSD.org> wrote on
Date: Wed, 09 Oct 2024 16:12:50 UTC :

> When I tried to build lang/rust in the 14i386 poudriere VM the compiler 
> got killed with this message in the kernel log:
> 
> 
> > Oct  9 05:21:11 yv kernel: pid 35188 (rustc), jid 1129, uid 65534, 
> was killed: a thread waited too long to allocate a page
> 
> 
> 
> The same system has no problem building lang/rust in the 14amd64 VM.
> 
> 
> What does it mean "waited too long"? Why is the process killed when 
> something is slow?
> Shouldn't it just wait instead?


If you want to allow it to potentially wait forever,
you can use:

sysctl vm.pfault_oom_attempts=-1

(or analogous in appropriate *.conf files
taht would later be executed).

You might end up with deadlock/livelock/. . .
if you do so. (I've not analyzed the details.)


Details:

Looking around, sys/vm/vm_pageout.c has:

               case VM_OOM_MEM_PF:
                        reason = "a thread waited too long to allocate a page";
                        break;

# grep -r VM_OOM_MEM_PF /usr/main-src/sys/
/usr/main-src/sys/vm/vm_pageout.h:#define VM_OOM_MEM_PF 2
/usr/main-src/sys/vm/vm_fault.c: vm_pageout_oom(VM_OOM_MEM_PF);
/usr/main-src/sys/vm/vm_pageout.c: if (shortage == VM_OOM_MEM_PF &&
/usr/main-src/sys/vm/vm_pageout.c: if (shortage == VM_OOM_MEM || shortage == VM_OOM_MEM_PF)
/usr/main-src/sys/vm/vm_pageout.c: case VM_OOM_MEM_PF:

sys/vm/vm_fault.c :
(NOTE: official code has its variant of the printf under a
"if (bootverbose)" but I locally remove that conditional.)

/*
 * Initiate page fault after timeout.  Returns true if caller should
 * do vm_waitpfault() after the call.
 */
static bool
vm_fault_allocate_oom(struct faultstate *fs)
{
        struct timeval now;
 
        vm_fault_unlock_and_deallocate(fs);
        if (vm_pfault_oom_attempts < 0)
                return (true);
        if (!fs->oom_started) {
                fs->oom_started = true;
                getmicrotime(&fs->oom_start_time);
                return (true);
        }
 
        getmicrotime(&now);
        timevalsub(&now, &fs->oom_start_time);
        if (now.tv_sec < vm_pfault_oom_attempts * vm_pfault_oom_wait)
                return (true);
 
        printf("vm_fault_allocate_oom: proc %d (%s) failed to alloc page on fault, starting OOM\n",
                curproc->p_pid, curproc->p_comm);
 
        vm_pageout_oom(VM_OOM_MEM_PF);
        fs->oom_started = false;
        return (false);
}


This is associated with vm.pfault_oom_attempts and
vm.pfault_oom_wait . An old comment in my
/boot/loader.conf is:

#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)

(Note: the "tradeoffs" is associated with:
sys/vm/vm_fault.c: vm_waitpfault(dset, vm_pfault_oom_wait * hz);
)

sys/vm/vm_pageout.c :

void
vm_pageout_oom(int shortage)
{
        const char *reason;
        struct proc *p, *bigproc;
        vm_offset_t size, bigsize;
        struct thread *td;
        struct vmspace *vm;
        int now;
        bool breakout;

        /*
         * For OOM requests originating from vm_fault(), there is a high
         * chance that a single large process faults simultaneously in
         * several threads.  Also, on an active system running many
         * processes of middle-size, like buildworld, all of them
         * could fault almost simultaneously as well.
         *
         * To avoid killing too many processes, rate-limit OOMs
         * initiated by vm_fault() time-outs on the waits for free
         * pages.
         */
        mtx_lock(&vm_oom_ratelim_mtx);
        now = ticks;
        if (shortage == VM_OOM_MEM_PF &&
            (u_int)(now - vm_oom_ratelim_last) < hz * vm_oom_pf_secs) {
                mtx_unlock(&vm_oom_ratelim_mtx);
                return;
        }
        vm_oom_ratelim_last = now;
        mtx_unlock(&vm_oom_ratelim_mtx);
. . .
                size = vmspace_swap_count(vm);
                if (shortage == VM_OOM_MEM || shortage == VM_OOM_MEM_PF)
                        size += vm_pageout_oom_pagecount(vm);
. . .

Looks like time based retries and giving up after
about the specified overall time for that many
retries, avoiding potentially waiting forever when
0 <= vm.pfault_oom_attempts .


===
Mark Millard
marklmi at yahoo.com