Re: devel/llvm13 failed to reclaim memory on 8 GB Pi4 running -current [UFS success context for 4 cores, notes added]
Date: Sat, 29 Jan 2022 19:23:58 UTC
On 2022-Jan-29, at 03:59, Mark Millard <marklmi@yahoo.com> wrote: > On 2022-Jan-28, at 19:20, Mark Millard <marklmi@yahoo.com> wrote: > >> On 2022-Jan-28, at 15:05, Mark Millard <marklmi@yahoo.com> wrote: >> >>> On 2022-Jan-28, at 00:31, Mark Millard <marklmi@yahoo.com> wrote: >>> >>>>> . . . >>>> >>>> UFS context: >>>> >>>> . . .; load averages: . . . MaxObs: 5.47, 4.99, 4.82 >>>> . . . threads: . . ., 14 MaxObsRunning >>>> . . . >>>> Mem: . . ., 6457Mi MaxObsActive, 1263Mi MaxObsWired, 7830Mi MaxObs(Act+Wir+Lndry) >>>> Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 8192Mi MaxObsUsed, 14758Mi MaxObs(Act+Lndry+SwapUsed), 16017Mi MaxObs(Act+Wir+Lndry+SwapUsed) >>>> >>>> >>>> Console: >>>> >>>> swap_pager: out of swap space >>>> swp_pager_getswapspace(4): failed >>>> swp_pager_getswapspace(1): failed >>>> swp_pager_getswapspace(1): failed >>>> swp_pager_getswapspace(2): failed >>>> swp_pager_getswapspace(2): failed >>>> swp_pager_getswapspace(4): failed >>>> swp_pager_getswapspace(1): failed >>>> swp_pager_getswapspace(9): failed >>>> swp_pager_getswapspace(4): failed >>>> swp_pager_getswapspace(7): failed >>>> swp_pager_getswapspace(29): failed >>>> swp_pager_getswapspace(9): failed >>>> swp_pager_getswapspace(1): failed >>>> swp_pager_getswapspace(2): failed >>>> swp_pager_getswapspace(1): failed >>>> swp_pager_getswapspace(4): failed >>>> swp_pager_getswapspace(1): failed >>>> swp_pager_getswapspace(10): failed >>>> >>>> . . . Then some time with no messages . . . >>>> >>>> vm_pageout_mightbe_oom: kill context: v_free_count: 7740, v_inactive_count: 1 >>>> Jan 27 23:01:07 CA72_UFS kernel: pid 57238 (c++), jid 3, uid 0, was killed: failed to reclaim memory >>>> swp_pager_getswapspace(2): failed >>>> >>>> >>>> Note: The "vm_pageout_mightbe_oom: kill context:" >>>> notice is one of the few parts of an old reporting >>>> patch Mark J. had supplied (long ago) that still >>>> fits in the modern code (or that I was able to keep >>>> updated enough to fit, anyway). It is another of the >>>> personal updates that I keep in my source trees, >>>> such as in /usr/main-src/ . >>>> >>>> diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c >>>> index 36d5f3275800..f345e2d4a2d4 100644 >>>> --- a/sys/vm/vm_pageout.c >>>> +++ b/sys/vm/vm_pageout.c >>>> @@ -1828,6 +1828,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage, >>>> * start OOM. Initiate the selection and signaling of the >>>> * victim. >>>> */ >>>> + printf("vm_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u\n", >>>> + vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt); >>>> vm_pageout_oom(VM_OOM_MEM); >>>> >>>> /* >>>> >>>> >>>> Again, I'd used vm.pfault_oom_attempts inappropriately >>>> for running out of swap (although with UFS it did do >>>> a kill fairly soon): >>>> >>>> # Delay when persistent low free RAM leads to >>>> # Out Of Memory killing of processes: >>>> vm.pageout_oom_seq=120 >>>> # >>>> # For plunty of swap/paging space (will not >>>> # run out), avoid pageout delays leading to >>>> # Out Of Memory killing of processes: >>>> vm.pfault_oom_attempts=-1 >>>> # >>>> # For possibly insufficient swap/paging space >>>> # (might run out), increase the pageout delay >>>> # that leads to Out Of Memory killing of >>>> # processes (showing defaults at the time): >>>> #vm.pfault_oom_attempts= 3 >>>> #vm.pfault_oom_wait= 10 >>>> # (The multiplication is the total but there >>>> # are other potential tradoffs in the factors >>>> # multiplied, even for nearly the same total.) >>>> >>>> I'll change: >>>> >>>> vm.pfault_oom_attempts >>>> vm.pfault_oom_wait >>>> >>>> and reboot --and start the bulk somewhat before >>>> going to bed. >>>> >>>> >>>> For reference: >>>> >>>> [00:02:13] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3 >>>> [07:37:05] [01] [07:34:52] Finished devel/llvm13 | llvm13-13.0.0_3: Failed: build >>>> >>>> >>>> [ 65% 4728/7265] . . . flang/lib/Evaluate/fold-designator.cpp >>>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-integer.cpp >>>> FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-integer.cpp.o >>>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-logical.cpp >>>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-complex.cpp >>>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-real.cpp >>>> >>>> So the flang/lib/Evaluate/fold-integer.cpp one was the one killed. >>>> >>>> Notably, the specific sources being compiled are different >>>> than in the ZFS context report. But this might be because >>>> of my killing ninja explicitly in the ZFS context, before >>>> killing the running compilers. >>>> >>>> Again, using the options to avoid building the Fortran >>>> compiler probably avoids such memory use --if you do not >>>> need the Fortran compiler. >>> >>> >>> UFS based on instead using (not vm.pfault_oom_attempts=-1): >>> >>> vm.pfault_oom_attempts= 3 >>> vm.pfault_oom_wait= 10 >>> >>> It reached swap-space-full: >>> >>> . . .; load averages: . . . MaxObs: 5.42, 4.98, 4.80 >>> . . . threads: . . ., 11 MaxObsRunning >>> . . . >>> Mem: . . ., 6482Mi MaxObsActive, 1275Mi MaxObsWired, 7832Mi MaxObs(Act+Wir+Lndry) >>> Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 4096B In, 81920B Out, 8192Mi MaxObsUsed, 14733Mi MaxObs(Act+Lndry+SwapUsed), 16007Mi MaxObs(Act+Wir+Lndry+SwapUsed) >>> >>> >>> swap_pager: out of swap space >>> swp_pager_getswapspace(5): failed >>> swp_pager_getswapspace(25): failed >>> swp_pager_getswapspace(1): failed >>> swp_pager_getswapspace(31): failed >>> swp_pager_getswapspace(6): failed >>> swp_pager_getswapspace(1): failed >>> swp_pager_getswapspace(25): failed >>> swp_pager_getswapspace(10): failed >>> swp_pager_getswapspace(17): failed >>> swp_pager_getswapspace(27): failed >>> swp_pager_getswapspace(5): failed >>> swp_pager_getswapspace(11): failed >>> swp_pager_getswapspace(9): failed >>> swp_pager_getswapspace(29): failed >>> swp_pager_getswapspace(2): failed >>> swp_pager_getswapspace(1): failed >>> swp_pager_getswapspace(9): failed >>> swp_pager_getswapspace(20): failed >>> swp_pager_getswapspace(4): failed >>> swp_pager_getswapspace(21): failed >>> swp_pager_getswapspace(11): failed >>> swp_pager_getswapspace(2): failed >>> swp_pager_getswapspace(21): failed >>> swp_pager_getswapspace(2): failed >>> swp_pager_getswapspace(1): failed >>> swp_pager_getswapspace(2): failed >>> swp_pager_getswapspace(3): failed >>> swp_pager_getswapspace(3): failed >>> swp_pager_getswapspace(2): failed >>> swp_pager_getswapspace(1): failed >>> swp_pager_getswapspace(20): failed >>> swp_pager_getswapspace(2): failed >>> swp_pager_getswapspace(1): failed >>> swp_pager_getswapspace(16): failed >>> swp_pager_getswapspace(6): failed >>> swap_pager: out of swap space >>> swp_pager_getswapspace(4): failed >>> swp_pager_getswapspace(9): failed >>> swp_pager_getswapspace(17): failed >>> swp_pager_getswapspace(30): failed >>> swp_pager_getswapspace(1): failed >>> >>> . . . Then some time with no messages . . . >>> >>> vm_pageout_mightbe_oom: kill context: v_free_count: 7875, v_inactive_count: 1 >>> Jan 28 14:36:44 CA72_UFS kernel: pid 55178 (c++), jid 3, uid 0, was killed: failed to reclaim memory >>> swp_pager_getswapspace(11): failed >>> >>> >>> So, not all that much different from how the >>> vm.pfault_oom_attempts=-1 example looked. >>> >>> >>> [00:01:00] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3 >>> [07:41:39] [01] [07:40:39] Finished devel/llvm13 | llvm13-13.0.0_3: Failed: build >>> >>> Again it killed: >>> >>> FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-integer.cpp.o >>> >>> So, basically the same stopping area as for the >>> vm.pfault_oom_attempts=-1 example. >>> >>> >>> I'll set things up for swap totaling to 30 GiBytes, reboot, >>> and start it again. This will hopefully let me see and >>> report MaxObs??? figures for a successful build when there >>> is RAM+SWAP: 38 GiBytes. So: more than 9 GiBytes per compiler >>> instance (mean). >> >> The analogous ZFS test with: >> >> vm.pfault_oom_attempts= 3 >> vm.pfault_oom_wait= 10 >> >> got: >> >> . . .; load averages: . . . MaxObs: 5.90, 5.07, 4.80 >> . . . threads: . . ., 11 MaxObsRunning >> . . . >> Mem: . . ., 6006Mi MaxObsActive >> . . . >> Swap: 8192Mi Total, 8192Mi Used, 32768B Free, 99% Inuse, 28984Ki In, 4792Ki Out, 8192Mi MaxObsUsed, 14282Mi MaxObs(Act+Lndry+SwapUsed), 16009Mi MaxObs(Act+Wir+Lndry+SwapUsed) >> >> (I got that slightly early, before the 100% showed up.) >> >> >> swap_pager: out of swap space >> swp_pager_getswapspace(10): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(4): failed >> swp_pager_getswapspace(16): failed >> swp_pager_getswapspace(5): failed >> swp_pager_getswapspace(2): failed >> swp_pager_getswapspace(8): failed >> swp_pager_getswapspace(12): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(32): failed >> swp_pager_getswapspace(4): failed >> swp_pager_getswapspace(9): failed >> swp_pager_getswapspace(4): failed >> swp_pager_getswapspace(17): failed >> swp_pager_getswapspace(21): failed >> swp_pager_getswapspace(10): failed >> swp_pager_getswapspace(18): failed >> swp_pager_getswapspace(6): failed >> swp_pager_getswapspace(2): failed >> swp_pager_getswapspace(14): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(5): failed >> swp_pager_getswapspace(25): failed >> swp_pager_getswapspace(12): failed >> swp_pager_getswapspace(5): failed >> swp_pager_getswapspace(7): failed >> swp_pager_getswapspace(10): failed >> swp_pager_getswapspace(3): failed >> swp_pager_getswapspace(24): failed >> swap_pager: out of swap space >> swp_pager_getswapspace(11): failed >> swap_pager: out of swap space >> swp_pager_getswapspace(17): failed >> swp_pager_getswapspace(5): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(32): failed >> swp_pager_getswapspace(15): failed >> swp_pager_getswapspace(19): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(25): failed >> swp_pager_getswapspace(11): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(15): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(8): failed >> swp_pager_getswapspace(31): failed >> swp_pager_getswapspace(26): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(20): failed >> swp_pager_getswapspace(4): failed >> swp_pager_getswapspace(3): failed >> swp_pager_getswapspace(3): failed >> swp_pager_getswapspace(9): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(15): failed >> swp_pager_getswapspace(3): failed >> swp_pager_getswapspace(7): failed >> swp_pager_getswapspace(8): failed >> swp_pager_getswapspace(17): failed >> swp_pager_getswapspace(2): failed >> swp_pager_getswapspace(10): failed >> swp_pager_getswapspace(6): failed >> swp_pager_getswapspace(2): failed >> swp_pager_getswapspace(11): failed >> swp_pager_getswapspace(21): failed >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(9): failed >> swp_pager_getswapspace(32): failed >> swp_pager_getswapspace(2): failed >> swp_pager_getswapspace(32): failed >> swp_pager_getswapspace(25): failed >> swp_pager_getswapspace(21): failed >> swp_pager_getswapspace(22): failed >> swp_pager_getswapspace(14): failed >> swp_pager_getswapspace(10): failed >> swap_pager: out of swap space >> swp_pager_getswapspace(1): failed >> swp_pager_getswapspace(28): failed >> swp_pager_getswapspace(2): failed >> swp_pager_getswapspace(13): failed >> swp_pager_getswapspace(3): failed >> swp_pager_getswapspace(31): failed >> swp_pager_getswapspace(20): failed >> swp_pager_getswapspace(2): failed >> vm_pageout_mightbe_oom: kill context: v_free_count: 8186, v_inactive_count: 1 >> Jan 28 18:42:42 CA72_4c8G_ZFS kernel: pid 98734 (c++), jid 3, uid 0, was killed: failed to reclaim memory >> >> [00:00:49] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3 >> [08:06:09] [01] [08:05:20] Finished devel/llvm13 | llvm13-13.0.0_3: Failed: build >> >> FAILED: tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-complex.cpp.o >> >> and flang/lib/Evaluate/fold-integer.cpp was one of the compiles going on. The below is about the success case for the 8 GiByte RPi4B: > Finally, what a successful build of devel/llvm13 on > UFS was like on the 8 GiByte RPi4B (overclocked, > USB3 NVMe based SSD): > > [00:00:57] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3 > [12:25:40] [01] [12:24:43] Finished devel/llvm13 | llvm13-13.0.0_3: Success > > where its Maximum Observed figures were: > > . . .; load averages: . . . MaxObs: 6.15, 5.71, 5.31 > . . . threads: . . ., 11 MaxObsRunning > . . . > Mem: . . ., 6465Mi MaxObsActive, 1355Mi MaxObsWired, 7832Mi MaxObs(Act+Wir+Lndry) > Swap: . . ., 10429Mi MaxObsUsed, 16799Mi MaxObs(Act+Lndry+SwapUsed), 18072Mi MaxObs(Act+Wir+Lndry+SwapUsed) > > But 18072Mi MaxObs(Act+Wir+Lndry+SwapUsed) == 17.6484375 GiByte, > so more than 17.6484375 GiByte for RAM+SWAP, depending on > how much room for inactive and margin one chooses. Probably > 20+ GiBytes, so 12+ GiBytes of swap for 8 GiBytes of RAM. > > (Reminder: maximum of sum <= sum of maximums.) For folks that might read the above without a lot of prior context . . . I forgot to mention above that the RPi4B has 4 cores and the poudriere ALLOW_PARALLEL_JOB= meant that there were 4 jobs (processes) much of the time. (Nightly cron related activity and made the MaxObs load averages bigger than the 4.? or 5.? that would otherwise have showed up.) Having notably more (or fewer) processes active for the build need not use RAM+SWAP proportionally overall. The 20+ GiBytes figure for 4 active hardware threads in use is somewhat context specific. So having 5+ GiBytes of RAM+SWAP per hardware thread that is to be in use may be significant overkill when there are notably more hardware threads involved. === Mark Millard marklmi at yahoo.com