Re: 14.0-CURRENT failed to reclaim memory error in RPi 3B build

From: Archimedes Gaviola <archimedes.gaviola_at_gmail.com>
Date: Mon, 21 Nov 2022 03:48:58 UTC
On Wed, Nov 9, 2022 at 10:15 AM Archimedes Gaviola <
archimedes.gaviola@gmail.com> wrote:

>
>
> On Wed, Nov 9, 2022 at 1:37 AM Mark Millard <marklmi@yahoo.com> wrote:
>
>> On Nov 8, 2022, at 04:15, Ronald Klop <ronald-lists@klop.ws> wrote:
>>
>> > Van: Warner Losh <imp@bsdimp.com>
>> > Datum: dinsdag, 8 november 2022 04:28
>> > Aan: Archimedes Gaviola <archimedes.gaviola@gmail.com
>> > . . .
>> > ...
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 256929, size: 4096
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 3628, size: 4096
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 255839, size:
>> 40960
>> > pid 46153 (c++), jid 0, uid 0, was killed: a thread waited too long to
>> allocate a page
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 255857, size:
>> 28672
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 3634, size: 8192
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 256037, size: 4096
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 255320, size: 8192
>> >   This means that paging to the swap partition and/or swap file took
>> too long (> 30 seconds... that's all that indefinite means). It also means
>> that it can't write to backing store dirty pages to give to another
>> process...
>> >   Typical reason is that the disk / flash is not responsive to writes
>> for some reason. You'll need to find why... I'd look at trims.
>> >   Or.... if you can't change the disk... you need to put less memory
>> pressure on it..
>> >   Warner
>> >
>> >
>> >
>> > NB: a way to put less memory pressure on it is not using -j3, but -j2
>> or -j1 in your make command.
>> >
>>
>
> Hi Mark,
>
>
>> Extending Ronold's comment: If things are really taking this
>> long for the paging I/O, you might actually find, say, -j2
>> takes less elapsed time than -j3 because of the latencies
>> involved in -j3 causing more overall delay.
>>
>
> Yes I'll take these options on lowering down N in the -jN parameter as my
> next steps. So far so good with -j3, ongoing build is still observed for 17
> hours now.
>
>
>>
>> vm.pfault_oom_attempts=-1 would still be appropriate for avoiding
>> I/O kills at any -jN: the smaller -jN just makes the issue less
>> likely, not impossible. (Again, presuming sufficient swap/paging
>> space if deadlock is to be well avoided.)
>>
>
> The ongoing build is at the moment on
> /usr/src/contrib/llvm-project/llvm/lib/*. I'm observing from time-to-time
> if the error will occur again.
>
>
>> (I use NVMe or SSD USB media that do not get such long delays but
>> fit the power limitations of the context. I have about as little
>> on microsd card media as I can get away with in my context. I also
>> avoid spinning rust. Thus I've only gotten "indefinite wait buffer"
>> or the like back before such was true, long ago.)
>>
>
> Okay thanks for sharing this one. Keeping this in my mind just in case I
> needed these types of media soon.
>
> Thanks and best regards,
> Archimedes
>

Hi Mark,

As a recap on the kernel tunables, the changes are the following,

root@generic:~ # sysctl -a | grep oom
vm.pageout_oom_seq: 120
vm.pfault_oom_wait: 10
vm.pfault_oom_attempts: -1

With -j1 and -j2 options, both were able to complete the kernel and
buildworld compilation in 103 and 84 hours respectively. Though I still
could see messages on "swap_pager: indefinite wait buffer: bufobj" but
definitely it's ignorable as it survived the compilation process. With the
-j3 option, it failed along the course of compilation, it encountered the
previous error on "failed to reclaim memory" but this time this error is
not that relevant as -j1 and -j2 already works. Preferably with -j2 as the
appropriate choice for my RPi 3B build setup.

Thanks and best regards,
Archimedes