Re: Can not build kernel on 1GB VM

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 15 Apr 2022 20:02:47 UTC
On 2022-Apr-15, at 11:40, Mark Millard <marklmi@yahoo.com> wrote:

> From: Michael Wayne <freebsd07_at_wayne47.com> 
> Date: Fri, 15 Apr 2022 13:49:53 -0400 :
> 
>> I have a VM with 1GB RAM running FreeBSD 12.1-RELEASE-p3
>> 
>> I'm trying to upgrade the machine to 12.3 and having swap failures.
>> 
>> This machine runs bird to advertise BGP, ssh and not much else so
>> the small amount of RAM is (usually) fine.
>> 
>> For a long time, there was a 1 GB swap file which handled the
>> occasional time when excess memory got used.
>> 
>> Machine needs a custom kernel for BGP, the conf file consists of:
>>   include GENERIC
>>   ident ROUTING
>>   options TCP_SIGNATURE
>> 
>> 
>> Today, while building the 12.3 kernel with:
>>   cd /usr/src
>>   sudo make toolchain
>>   sudo make buildkernel KERNCONF=ROUTING
>> the machine ran out of swap. with a bunch of messages like:
>>   Apr 15 12:11:26 g1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 240593, size: 4096
>>   Apr 15 12:11:35 g1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 236224, size: 16384
>>   Apr 15 12:11:37 g1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 245, size: 12288
>>   Apr 15 12:11:46 g1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 240593, size: 4096
>>   Apr 15 12:11:55 g1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 236224, size: 16384
>>   Apr 15 12:11:57 g1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 245, size: 12288
>> 
>> Thinking it was a sawp space issue, I increased the swap to 4 GB and
>> tried again with the same results. Boot gave the kern.maxswzone message,
>> I ignored it as I had planned to change as soon as I completed the build.
>> 
>> So I pulled up top in a console window and watched swap during the
>> build.  About 400 MB of RAM was free and about 3 MB of swap was
>> used when the machine started linking the kernel:
>>   ctfmerge -L VERSION -g -o kernel.full ...
>> While this command was running, I saw swap usage go to ~5MB (so
>> just over 1%), then started seeing processes being killed due to
>> out of swap space.
> 
> The "out of swap space" message is usually a misnomer

I should have been explicit that the misnomer messages are
when it is part of a OOM kill notification message.

There is a separate message about "out of swap space" that
is just a notification of that status. This message is not
a misnomer and need not imply that I OOM kill will or has
happened.

> and has
> been replaced in main [so: 14], stable/13 , and releng/13.1 :
> 
>                case VM_OOM_MEM:
>                        reason = "failed to reclaim memory";
>                        break;
>                case VM_OOM_MEM_PF:
>                        reason = "a thread waited too long to allocate a page";
>                        break;
> 
> (There is one more case that still has the misnomer but
> case VM_OOM_SWAPZ seems unlikely to actually happen.)
> 
> Given that you are getting the swap_pager: indefinite wait buffer
> notices I can not tell which of the two above is happening.
> 
>> So, how to proceed?
> 
> My /boot/loader/conf has the likes of:
> 
> # Delay when persistent low free RAM leads to
> # Out Of Memory killing of processes:
> vm.pageout_oom_seq=120
> #
> # For plunty of swap/paging space (will not
> # run out), avoid pageout delays leading to
> # Out Of Memory killing of processes:
> vm.pfault_oom_attempts=-1
> #
> # For possibly insufficient swap/paging space
> # (might run out), increase the pageout delay
> # that leads to Out Of Memory killing of
> # processes (showing defaults at the time):
> #vm.pfault_oom_attempts= 3
> #vm.pfault_oom_wait= 10
> # (The multiplication is the total but there
> # are other potential tradoffs in the factors
> # multiplied, even for nearly the same total.)
> 
> The vm.pageout_oom_seq=120 delays VM_OOM_MEM.
> The vm.pfault_oom_attempts=-1 avoids VM_OOM_MEM_PF.
> 
> Note: vm.pfault_oom_attempts=-1 can lead to deadlock
> if you actually run out of swap as I understand.
> 
> You could try setting both vm.pfault_oom_attempts and
> vm.pfault_oom_wait but I've no specific suggested
> values for your context.
> 
> 
> Note: I do not recommend having so much swap that
> you get the the kern.maxswzone message. I do not
> recommend adjusting kern.maxswzone as it competes
> with other kernel resources --unless you understand
> the tradeoffs in fair detail. (I do not understand
> them in much detail.)
> 

FYI: "swap_pager: indefinite wait buffer" is for
a swap read taking over 20 seconds (at least
in main [so: 14]):

        /*
         * Wait for the pages we want to complete.  VPO_SWAPINPROG is always
         * cleared on completion.  If an I/O error occurs, SWAPBLK_NONE
         * is set in the metadata for each page in the request.
         */
        VM_OBJECT_WLOCK(object);
        /* This could be implemented more efficiently with aflags */
        while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) {
                ma[0]->oflags |= VPO_SWAPSLEEP;
                VM_CNT_INC(v_intrans);
                if (VM_OBJECT_SLEEP(object, &object->handle, PSWP,
                    "swread", hz * 20)) {
                        printf(
"swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n",
                            bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount);
                }
        }
        VM_OBJECT_WUNLOCK(object);


Also, for reference:

# sysctl -d vm.pageout_oom_seq vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM
vm.pfault_oom_attempts: Number of page allocation attempts in page fault handler before it triggers OOM handling
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying the page fault handler

The default for vm.pageout_oom_seq was 12 last I checked.


===
Mark Millard
marklmi at yahoo.com