Re: Armv7 (rpi2) getting stuck in buildworld for -current
Date: Wed, 17 May 2023 17:04:16 UTC
On May 17, 2023, at 09:00, bob prohaska <fbsd@www.zefox.net> wrote: > On Sun, May 14, 2023 at 08:12:23PM -0700, Mark Millard wrote: >> >> I'm unsure if you have well avoided having any tmpfs based >> space or the like that would compete for RAM and use some >> of the RAM+SWAP. In the low RAM environments, I avoid such >> competition and use UFS to exclusion. >> > > No use of tmpfs, the line in /etc/fstab is commented out > I've commented out the changes to /boot/loader.conf related > to virtual memory as well. You mean all the names that start with "vm." that are assigned in loader.conf ? : #vm.pageout_oom_seq="4096" #vm.pfault_oom_attempts="120" #vm.pfault_oom_wait="20" I'd recommend at least: vm.pageout_oom_seq=120 just to make sure that kills are less likely than with the default value (12). The assignment only contributes to the choice of when to do kills, no other aspect of of the virtual/RAM memory handling. You might also consider: vm.pfault_oom_attempts=-1 I'll note that vm.pageout_oom_seq is both a tunable and writable. So, if you have assignments in multiple places, the most recent to execute is active. The same applies to vm.pfault_oom_wait and vm.pfault_oom_attempts . You do not mention /etc/sysctl.conf and its: vm.swap_enabled=0 vm.swap_idle_enabled=0 I'd keep those as well. > All were introduced in response > to slow flash storage, it looks like they're not needed with > mechanical disks and at least sometimes counterproductive. > Since these changes there have been no communications losses. Intersting. As far as I can tell the only two that might have contributed to that are/were: vm.pfault_oom_attempts="120" vm.pfault_oom_wait="20" >> I'll note that causing swap space thrashing can make builds >> take longer. "Thrashing" is not directly the space used but >> the frequency/backlog of swap space I/O. I always avoided >> configurations that thrashed for notable periods of time, >> via using -j given that I'd already avoied RAM+SWAP >> competition. But thrashing is also tied to the likes of >> spinning rust vs. various, for example, NVMe USB media. It >> is probably generally easier to make spinning rust thrash >> for notable periods. I'd also avoided spinning rust. > > I can't help but wonder if the dominant I/O bottleneck > on a Pi2 or Pi3 isn't the usb subsystem. There is not just one bottleneck. Spinning rust introduces latency on a scale large compared to USB2-bus latency contributions. For example, seek time for spinning rust. (More about this later.) The USB2 on the RPi[23]'s may well keep your spinning rust below its maximum bandwidth. But that is a separate type of bottleneck. > With none-too-fast > 5400 rpm mechanical disks there are no console warnings about > swap, despite obvious memory pressure (high swap use, high > idle percentage). Most of the time one thread is eventually > given elevated priority and the overload is worked through. > > This morning a Pi3 was found seemingly jammed. All four threads > were about 500MB in size, all had priority 20 with about 1% WCPU. > No console messages warned of swap pressure, but the system was stalled. > Occasionally one thread would get priority 21, but quickly reverted > to 20 so the jam didn't clear. After poking around interactively > reading man pages one thread got priority 135 and progress resumed. Just sounds to me like it was I/O bound thrashing, something that could make builds take longer than using a smaller -jN would do, depending on the details. Paging tends to be small transfers with random positioning, leading to lots of seek time for spinning rust. When thrashing the system can spend notably more elapsed time seeking than transferring data. A single paging transfer need not give enough context for a core to spend any notable time computing: it likely has to wait for more transfers to establish a big enough working set. Multiple thrashers tend to block each other from reaching the desired status. > For the moment it appears that, at least when using mechanical > disks, no adjustments to the VM configuration are needed on > either Pi2 or Pi3. Random user interaction via keyboard seems > helful to break priority ties when swap use becomes intense. I'd call the evidence for the "random user interaction via keyboard" inference weak. It is a kind of context in which it is difficult to have good evidence about what would have been different without the manual activity. I expect that if you had waited, the result would have been similar. But the evidence for that is weak as well. > Might it be possible for a script to detect thrashing and stimulate > similar behavior? I doubt that the utility vs. just using a smaller -jN for the build, leading to less time spent thrashing the spinning rust. === Mark Millard marklmi at yahoo.com