Re: devel/llvm13 failed to reclaim memory on 8 GB Pi4 running -current [ZFS context: used the whole swap space]
Date: Fri, 28 Jan 2022 05:55:47 UTC
On 2022-Jan-27, at 17:43, Mark Millard <marklmi@yahoo.com> wrote: > On 2022-Jan-27, at 15:30, bob prohaska <fbsd@www.zefox.net> wrote: > >> On Thu, Jan 27, 2022 at 02:21:44PM -0800, Mark Millard wrote: >>> >>> Okay. I just started a poudriere bulk devel/llvm13 build >>> in a ZFS context: >>> >>> . . . >>> [00:00:37] Pkg: +BE_AMDGPU -BE_FREEBSD +BE_NATIVE -BE_STANDARD +BE_WASM +CLANG +DOCS +EXTRAS -FLANG +LIT +LLD +LLDB +MLIR -OPENMP -PYCLANG >>> [00:00:37] New: +BE_AMDGPU -BE_FREEBSD -BE_NATIVE +BE_STANDARD +BE_WASM +CLANG +DOCS +EXTRAS +FLANG +LIT +LLD +LLDB +MLIR +OPENMP +PYCLANG >>> . . . >>> [00:01:27] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3 >>> >> >> Is this ARM hardware, or an emulator? > > 8 GiByte RPi4B, USB3 NVMe media with a ZFS partition. The content > is a slightly modified copy of the HoneyComb's PCIe slot Optane > media. > > The UFS-based 8 GiByte RPi4B is also based on copying from the > same Optane media, both for the system materials and various > ports/packages/pouriere related materials. (Not, necessarily, > other things.) > >> I've been using plain old make in /usr/ports/devel, >> might it be informative to try a poudriere build as well? > > The Pkg:, New:, and llvm13 lines I listed are poudriere(-devel) > output. I am doing my builds via poudriere. ALLOW_PARALLEL_JOBS= > and USE_TMPFS="data" in use. > > I have a context in which almost all prerequisites had already > been built. (The change in options lead to 2 very small ports > to build before devel/llvm13's started in a builder.) > > (You might not have a jail that already has the prerequisites.) > >> One would expect the added overhead to increase memory use. >> > > Well, from the context I started in, only devel/llvm13 is being > built once it starts. Once it gets to the build phase (after > dependencies and such are set up), there is not much overhead > because the only activity is the one builder and it is only > building llvm13 --via make in the builder. At the end there > would be extra activity as poudriere finishes up. During the > build phase, I only expect minor overhead from poudriere > monitoring the build logs and such. > > I expect that the mere fact that a poudriere jail is in use > for the builder to execute in does not contribute to > significantly increasing the system's memory use or changing > the system's memory use pattern. > > > There are some other differences my context. The instances of > main [so: 14] are non-debug builds (but with symbols). The > builds are optimized for the RPi4B (and others) via use of > -mcpu=cortex-a72 usage. My /usr/main-src/ does have some > personal changes in it. (Some messaging about the kills is > part of that.) > > The RPi4B's are using: > > over_voltage=6 > arm_freq=2000 > sdram_freq_min=3200 > force_turbo=1 > > (There are heat-sinks, fans, and good power supplies.) > > The media in use are USB3 1 TB Samsung Portable SSD T7 > Touch's. I'm unlikely to see "swap_pager: indefinite > wait buffer:" notices if the cause was based on the > media performance. (You have spinning rust, if I > remember right.) > > I do not have a monitoring script making a huge log file > during the build. So less is competing for media access > or leading to other overheads. (But, as I remember, > you have gotten the problem without having such a script > running.) ZFS context: Well, the ZFS example used up all the swap space, according to my patched top. This means that my setting of vm.pfault_oom_attempts is not appropriate for this context: # Delay when persistent low free RAM leads to # Out Of Memory killing of processes: vm.pageout_oom_seq=120 # # For plunty of swap/paging space (will not # run out), avoid pageout delays leading to # Out Of Memory killing of processes: vm.pfault_oom_attempts=-1 # # For possibly insufficient swap/paging space # (might run out), increase the pageout delay # that leads to Out Of Memory killing of # processes (showing defaults at the time): #vm.pfault_oom_attempts= 3 #vm.pfault_oom_wait= 10 # (The multiplication is the total but there # are other potential tradoffs in the factors # multiplied, even for nearly the same total.) I'll need to retest with something more like the commented out vm.pfault_oom_attempts and vm.pfault_oom_wait figures in order to see the intended handling of the test case. What are you using for each of: vm.pageout_oom_seq ? vm.pfault_oom_attempts ? vm.pfault_oom_wait ? For reference, for ZFS: last pid: 380; load averages: 1.50, 3.07, 3.93 MaxObs: 5.71, 4.92, 4.76 up 0+07:23:14 21:23:43 68 threads: 1 running, 65 sleeping, 2 waiting, 19 MaxObsRunning CPU: 13.3% user, 0.0% nice, 4.9% system, 0.9% interrupt, 80.8% idle Mem: 4912Mi Active, 167936B Inact, 1193Mi Laundry, 1536Mi Wired, 40960B Buf, 33860Ki Free, 6179Mi MaxObsActive, 6476Mi MaxObsWired, 7820Mi MaxObs(Act+Wir+Lndry) ARC: 777086Ki Total, 132156Ki MFU, 181164Ki MRU, 147456B Anon, 5994Ki Header, 457626Ki Other 59308Ki Compressed, 254381Ki Uncompressed, 4.29:1 Ratio Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 19572Ki In, 3436Ki Out, 8192Mi MaxObsUsed, 14458Mi MaxObs(Act+Lndry+SwapUsed), 15993Mi MaxObs(Act+Wir+Lndry+SwapUsed) Console: (Looks like I misremembered adjusting the "out of swap space" wording for the misnomer message.) swap_pager: out of swap space swp_pager_getswapspace(18): failed swap_pager: out of swap space swp_pager_getswapspace(1): failed swp_pager_getswapspace(1): failed swap_pager: out of swap space swp_pager_getswapspace(1): failed swp_pager_getswapspace(7): failed swp_pager_getswapspace(24): failed swp_pager_getswapspace(3): failed swp_pager_getswapspace(18): failed swp_pager_getswapspace(17): failed swp_pager_getswapspace(1): failed swp_pager_getswapspace(12): failed swp_pager_getswapspace(23): failed swp_pager_getswapspace(30): failed swp_pager_getswapspace(3): failed swp_pager_getswapspace(2): failed . . . Then a bunch of time with no messages . . . swp_pager_getswapspace(5): failed swp_pager_getswapspace(28): failed . . . Then a bunch of time with no messages . . . Top again: last pid: 382; load averages: 0.73, 1.00, 2.40 MaxObs: 5.71, 4.92, 4.76 up 0+07:31:26 21:31:55 70 threads: 1 running, 65 sleeping, 4 waiting, 19 MaxObsRunning CPU: 0.1% user, 0.0% nice, 5.6% system, 0.0% interrupt, 94.3% idle Mem: 3499Mi Active, 4096B Inact, 2612Mi Laundry, 1457Mi Wired, 40960B Buf, 34676Ki Free, 6179Mi MaxObsActive, 6476Mi MaxObsWired, 7820Mi MaxObs(Act+Wir+Lndry) ARC: 777154Ki Total, 135196Ki MFU, 178330Ki MRU, 5995Ki Header, 457631Ki Other 59520Ki Compressed, 254231Ki Uncompressed, 4.27:1 Ratio Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 409600B In, 4096B Out, 8192Mi MaxObsUsed, 14458Mi MaxObs(Act+Lndry+SwapUsed), 15993Mi MaxObs(Act+Wir+Lndry+SwapUsed) I then used top to kill ninja and the 4 large compiles that were going on. I'll change: vm.pfault_oom_attempts vm.pfault_oom_wait and reboot and start over. I expect that the ongoing UFS test will likely end up similarly and that similar adjustments and restarts will be needed because of actually running out of swap space. === Mark Millard marklmi at yahoo.com