Re: devel/llvm13 failed to reclaim memory on 8 GB Pi4 running -current [ZFS context: similar to UFS]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 29 Jan 2022 03:20:31 UTC
On 2022-Jan-28, at 15:05, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Jan-28, at 00:31, Mark Millard <marklmi@yahoo.com> wrote:
> 
>>> . . .
>> 
>> UFS context:
>> 
>> . . .;  load averages:   . . . MaxObs:   5.47,   4.99,   4.82
>> . . . threads:    . . ., 14 MaxObsRunning
>> . . .
>> Mem: . . ., 6457Mi MaxObsActive, 1263Mi MaxObsWired, 7830Mi MaxObs(Act+Wir+Lndry)
>> Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 8192Mi MaxObsUsed, 14758Mi MaxObs(Act+Lndry+SwapUsed), 16017Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>> 
>> 
>> Console:
>> 
>> swap_pager: out of swap space
>> swp_pager_getswapspace(4): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(4): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(9): failed
>> swp_pager_getswapspace(4): failed
>> swp_pager_getswapspace(7): failed
>> swp_pager_getswapspace(29): failed
>> swp_pager_getswapspace(9): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(4): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(10): failed
>> 
>> . . . Then some time with no messages . . .
>> 
>> vm_pageout_mightbe_oom: kill context: v_free_count: 7740, v_inactive_count: 1
>> Jan 27 23:01:07 CA72_UFS kernel: pid 57238 (c++), jid 3, uid 0, was killed: failed to reclaim memory
>> swp_pager_getswapspace(2): failed
>> 
>> 
>> Note: The "vm_pageout_mightbe_oom: kill context:"
>> notice is one of the few parts of an old reporting
>> patch Mark J. had supplied (long ago) that still
>> fits in the modern code (or that I was able to keep
>> updated enough to fit, anyway). It is another of the
>> personal updates that I keep in my source trees,
>> such as in /usr/main-src/ .
>> 
>> diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
>> index 36d5f3275800..f345e2d4a2d4 100644
>> --- a/sys/vm/vm_pageout.c
>> +++ b/sys/vm/vm_pageout.c
>> @@ -1828,6 +1828,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
>>        * start OOM.  Initiate the selection and signaling of the
>>        * victim.
>>        */
>> +       printf("vm_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u\n",
>> +          vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
>>       vm_pageout_oom(VM_OOM_MEM);
>> 
>>       /*
>> 
>> 
>> Again, I'd used vm.pfault_oom_attempts inappropriately
>> for running out of swap (although with UFS it did do
>> a kill fairly soon):
>> 
>> # Delay when persistent low free RAM leads to
>> # Out Of Memory killing of processes:
>> vm.pageout_oom_seq=120
>> #
>> # For plunty of swap/paging space (will not
>> # run out), avoid pageout delays leading to
>> # Out Of Memory killing of processes:
>> vm.pfault_oom_attempts=-1
>> #
>> # For possibly insufficient swap/paging space
>> # (might run out), increase the pageout delay
>> # that leads to Out Of Memory killing of
>> # processes (showing defaults at the time):
>> #vm.pfault_oom_attempts= 3
>> #vm.pfault_oom_wait= 10
>> # (The multiplication is the total but there
>> # are other potential tradoffs in the factors
>> # multiplied, even for nearly the same total.)
>> 
>> I'll change:
>> 
>> vm.pfault_oom_attempts
>> vm.pfault_oom_wait
>> 
>> and reboot --and start the bulk somewhat before
>> going to bed.
>> 
>> 
>> For reference:
>> 
>> [00:02:13] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
>> [07:37:05] [01] [07:34:52] Finished devel/llvm13 | llvm13-13.0.0_3: Failed: build
>> 
>> 
>> [ 65% 4728/7265] . . . flang/lib/Evaluate/fold-designator.cpp
>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-integer.cpp
>> FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-integer.cpp.o 
>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-logical.cpp
>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-complex.cpp
>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-real.cpp
>> 
>> So the flang/lib/Evaluate/fold-integer.cpp one was the one killed.
>> 
>> Notably, the specific sources being compiled are different
>> than in the ZFS context report. But this might be because
>> of my killing ninja explicitly in the ZFS context, before
>> killing the running compilers.
>> 
>> Again, using the options to avoid building the Fortran
>> compiler probably avoids such memory use --if you do not
>> need the Fortran compiler.
> 
> 
> UFS based on instead using (not vm.pfault_oom_attempts=-1):
> 
> vm.pfault_oom_attempts= 3
> vm.pfault_oom_wait= 10
> 
> It reached swap-space-full:
> 
> . . .;  load averages:   . . . MaxObs:   5.42,   4.98,   4.80
> . . . threads:    . . ., 11 MaxObsRunning
> . . .
> Mem: . . ., 6482Mi MaxObsActive, 1275Mi MaxObsWired, 7832Mi MaxObs(Act+Wir+Lndry)
> Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 4096B In, 81920B Out, 8192Mi MaxObsUsed, 14733Mi MaxObs(Act+Lndry+SwapUsed), 16007Mi MaxObs(Act+Wir+Lndry+SwapUsed)
> 
> 
> swap_pager: out of swap space
> swp_pager_getswapspace(5): failed
> swp_pager_getswapspace(25): failed
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(31): failed
> swp_pager_getswapspace(6): failed
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(25): failed
> swp_pager_getswapspace(10): failed
> swp_pager_getswapspace(17): failed
> swp_pager_getswapspace(27): failed
> swp_pager_getswapspace(5): failed
> swp_pager_getswapspace(11): failed
> swp_pager_getswapspace(9): failed
> swp_pager_getswapspace(29): failed
> swp_pager_getswapspace(2): failed
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(9): failed
> swp_pager_getswapspace(20): failed
> swp_pager_getswapspace(4): failed
> swp_pager_getswapspace(21): failed
> swp_pager_getswapspace(11): failed
> swp_pager_getswapspace(2): failed
> swp_pager_getswapspace(21): failed
> swp_pager_getswapspace(2): failed
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(2): failed
> swp_pager_getswapspace(3): failed
> swp_pager_getswapspace(3): failed
> swp_pager_getswapspace(2): failed
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(20): failed
> swp_pager_getswapspace(2): failed
> swp_pager_getswapspace(1): failed
> swp_pager_getswapspace(16): failed
> swp_pager_getswapspace(6): failed
> swap_pager: out of swap space
> swp_pager_getswapspace(4): failed
> swp_pager_getswapspace(9): failed
> swp_pager_getswapspace(17): failed
> swp_pager_getswapspace(30): failed
> swp_pager_getswapspace(1): failed
> 
> . . . Then some time with no messages . . .
> 
> vm_pageout_mightbe_oom: kill context: v_free_count: 7875, v_inactive_count: 1
> Jan 28 14:36:44 CA72_UFS kernel: pid 55178 (c++), jid 3, uid 0, was killed: failed to reclaim memory
> swp_pager_getswapspace(11): failed
> 
> 
> So, not all that much different from how the
> vm.pfault_oom_attempts=-1 example looked.
> 
> 
> [00:01:00] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
> [07:41:39] [01] [07:40:39] Finished devel/llvm13 | llvm13-13.0.0_3: Failed: build
> 
> Again it killed:
> 
> FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-integer.cpp.o
> 
> So, basically the same stopping area as for the
> vm.pfault_oom_attempts=-1 example.
> 
> 
> I'll set things up for swap totaling to 30 GiBytes, reboot,
> and start it again. This will hopefully let me see and
> report MaxObs??? figures for a successful build when there
> is RAM+SWAP: 38 GiBytes. So: more than 9 GiBytes per compiler
> instance (mean).

The analogous ZFS test with:

vm.pfault_oom_attempts= 3
vm.pfault_oom_wait= 10

got:

. . .;  load averages:   . . . MaxObs:   5.90,   5.07,   4.80
. . . threads:    . . ., 11 MaxObsRunning
. . .
Mem: . . ., 6006Mi MaxObsActive
. . .
Swap: 8192Mi Total, 8192Mi Used, 32768B Free, 99% Inuse, 28984Ki In, 4792Ki Out, 8192Mi MaxObsUsed, 14282Mi MaxObs(Act+Lndry+SwapUsed), 16009Mi MaxObs(Act+Wir+Lndry+SwapUsed)

(I got that slightly early, before the 100% showed up.)


swap_pager: out of swap space
swp_pager_getswapspace(10): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(4): failed
swp_pager_getswapspace(16): failed
swp_pager_getswapspace(5): failed
swp_pager_getswapspace(2): failed
swp_pager_getswapspace(8): failed
swp_pager_getswapspace(12): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(32): failed
swp_pager_getswapspace(4): failed
swp_pager_getswapspace(9): failed
swp_pager_getswapspace(4): failed
swp_pager_getswapspace(17): failed
swp_pager_getswapspace(21): failed
swp_pager_getswapspace(10): failed
swp_pager_getswapspace(18): failed
swp_pager_getswapspace(6): failed
swp_pager_getswapspace(2): failed
swp_pager_getswapspace(14): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(5): failed
swp_pager_getswapspace(25): failed
swp_pager_getswapspace(12): failed
swp_pager_getswapspace(5): failed
swp_pager_getswapspace(7): failed
swp_pager_getswapspace(10): failed
swp_pager_getswapspace(3): failed
swp_pager_getswapspace(24): failed
swap_pager: out of swap space
swp_pager_getswapspace(11): failed
swap_pager: out of swap space
swp_pager_getswapspace(17): failed
swp_pager_getswapspace(5): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(32): failed
swp_pager_getswapspace(15): failed
swp_pager_getswapspace(19): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(25): failed
swp_pager_getswapspace(11): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(15): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(8): failed
swp_pager_getswapspace(31): failed
swp_pager_getswapspace(26): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(20): failed
swp_pager_getswapspace(4): failed
swp_pager_getswapspace(3): failed
swp_pager_getswapspace(3): failed
swp_pager_getswapspace(9): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(15): failed
swp_pager_getswapspace(3): failed
swp_pager_getswapspace(7): failed
swp_pager_getswapspace(8): failed
swp_pager_getswapspace(17): failed
swp_pager_getswapspace(2): failed
swp_pager_getswapspace(10): failed
swp_pager_getswapspace(6): failed
swp_pager_getswapspace(2): failed
swp_pager_getswapspace(11): failed
swp_pager_getswapspace(21): failed
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(9): failed
swp_pager_getswapspace(32): failed
swp_pager_getswapspace(2): failed
swp_pager_getswapspace(32): failed
swp_pager_getswapspace(25): failed
swp_pager_getswapspace(21): failed
swp_pager_getswapspace(22): failed
swp_pager_getswapspace(14): failed
swp_pager_getswapspace(10): failed
swap_pager: out of swap space
swp_pager_getswapspace(1): failed
swp_pager_getswapspace(28): failed
swp_pager_getswapspace(2): failed
swp_pager_getswapspace(13): failed
swp_pager_getswapspace(3): failed
swp_pager_getswapspace(31): failed
swp_pager_getswapspace(20): failed
swp_pager_getswapspace(2): failed
vm_pageout_mightbe_oom: kill context: v_free_count: 8186, v_inactive_count: 1
Jan 28 18:42:42 CA72_4c8G_ZFS kernel: pid 98734 (c++), jid 3, uid 0, was killed: failed to reclaim memory

[00:00:49] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
[08:06:09] [01] [08:05:20] Finished devel/llvm13 | llvm13-13.0.0_3: Failed: build

FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-complex.cpp.o

and flang/lib/Evaluate/fold-integer.cpp was one of the compiles going on.

===
Mark Millard
marklmi at yahoo.com