Re: More swap trouble with armv7, was Re: -current on armv7 stuck with flashing disk light
Date: Tue, 04 Jul 2023 21:22:18 UTC
On Jul 4, 2023, at 12:07, bob prohaska <fbsd@www.zefox.net> wrote: > On Tue, Jun 27, 2023 at 10:16:57AM -0700, bob prohaska wrote: >> On Tue, Jun 27, 2023 at 09:59:40AM -0700, Mark Millard wrote: >>>> >>>> If you want to identify system hangs, please >>>> put back: >>>> >>>> vm.swap_enabled=0 >>>> vm.swap_idle_enabled=0 >>>> >> >> They're reinstated now, but I don't want to disturb the system >> while it seems to be building world acceptably. >> > Reinstating > vm.swap_enabled=0 > vm.swap_idle_enabled=0 > > and limiting buildworld to -j3 allows buildworld to complete successfully in 1 GB of swap. > > Meanwhile, attempts to compile sysutils/usbtop using poudriere still cause swap exhaustion > while compiling /devel/llvm15 even with 2 GB of swap allocated. What sort of parallelism settings in poudriere for the devel/llvm15 build attempt? Have you tried allowing less parallelism (if there is a less for what you have tried)? What options are enabled vs. disabled for devel/llvm15 ? BE_STANDARD vs. BE_FREEBSD vs. BE_NATIVE ? BE_NATIVE probably help limit resource use the most if it happens to be sufficient. BE_FREEBSD would be in the middle of the 3 options for this issue. Is MLIR enabled? If having it disabled is sufficient, it being disabled should help avoid as much resource use. Simiarly for FLANG. (Building FLANG requires MLIR, so having MLIR disabled implies FLANG needing to also be disabled.) > The messages are > Jul 4 11:18:48 www kernel: pid 1074 (getty), jid 0, uid 0, was killed: out of swap space In my view the "out of swap space" is still a misleading misnomer for this context, but at least the following messages are more specific to the actual internal data-structure(s) problem(s). My understanding is that the data structures can have fragmentation issues. For fragmentation issues, prior history since booting might contribute, and building just after a reboot may end up with less fragmentation. (Unknown if sufficiently less.) Also, over allocating the swap partition (by not having kern.maxswzone appropriately matching) likely makes "swap blk zone exhausted" more likely. It is one of the reasons I avoid using swap partitioning with a total size that generates the message about possible mistuning. > swap blk zone exhausted, increase kern.maxswzone Have you ever gotten the above line before? I was unaware of any examples of it showing up. > swblk zone ok I'll note that there is another potential message pair for "swap pctrie zone exhausted"/"swpctrie zone ok" that you have not reported getting. Have you ever seen the "swap pctrie zone exhausted" notice? (Just curiosity on my part.) > IIRC the "increase kern.maxswzone" is unhelpful, if not impossible. The > "swblk zone ok" seems new. Are you using the default kern.maxswzone for your context? What is its value? Did you get the notice about possible mistuning for your combination of swap partition sizing and kern.maxswzone value? Or did "swap blk zone" happen even without that notice happening? > From the gstat output near peak swap use the system wasn't I/O bound, The "swap blk zone" contains an in-kernel-RAM data structure that is involved in managing the swap space usage. > the disk was less than 25% busy at the time of the first OOMA kill. "swap blk zone" can end up with fragmentation issues, where the total available is only made up of a bunch of tiny chunks and nothing large can be handled as a unit any more. (A general description of "fragmented".) > Eventually it was possible to log in on the serial console and run top: > > 33 processes: 1 running, 29 sleeping, 3 zombie > CPU: 0.0% user, 0.0% nice, 10.6% system, 0.2% interrupt, 89.2% idle > Mem: 139M Active, 8256K Inact, 252M Laundry, 221M Wired, 98M Buf, 292M Free > Swap: 2048M Total, 1291M Used, 756M Free, 63% Inuse > > PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 40719 0 root 1 20 -20 0B 8192B swzonx 0 0:12 9.15% cron > 40717 0 root 1 20 -20 0B 8192B swzonx 0 0:34 9.08% sh > 40709 0 root 1 20 -20 0B 8192B swzonx 0 0:38 9.01% sshd > 40720 0 root 1 20 -20 0B 8192B swzonx 3 0:13 7.47% sh Unfortunately the swzonx text is truncated. There is actually: pause("swzonxb", 10); for swblk zone and: pause("swzonxp", 10); for swap pctrie zone top's display leaves it unclear which was involved. > 40721 0 bob 1 20 0 6608K 2600K CPU1 1 0:00 0.32% top > 25761 0 bob 1 20 0 14M 6136K select 0 0:02 0.03% sshd > 25852 0 root 1 20 0 4668K 1648K ttyin 1 0:01 0.03% tip > 1237 0 root 1 20 0 5820K 1540K wait 1 0:12 0.00% sh > 25381 0 root 1 23 0 14M 5868K select 1 0:01 0.00% sshd > 1030 0 root 1 24 0 13M 2416K vmbckw 1 0:00 0.00% sshd > 12715 0 root 1 68 0 5820K 1660K wait 0 0:00 0.00% sh > 12710 0 root 1 20 0 5820K 1556K piperd 1 0:00 0.00% sh > 929 0 root 1 20 0 5356K 1256K select 3 0:00 0.00% syslogd > 1014 0 root 1 20 0 5124K 1356K nanslp 2 0:00 0.00% cron > 25770 0 bob 1 36 0 6844K 3116K pause 1 0:00 0.00% tcsh > 25794 0 bob 1 24 0 5380K 2188K wait 2 0:00 0.00% su > 39626 0 root 1 20 0 5424K 2404K wait 2 0:00 0.00% login > 40635 0 bob 1 20 0 6824K 3272K pause 1 0:00 0.00% tcsh > 25820 0 root 1 21 0 5608K 2204K wait 0 0:00 0.00% sh > 25851 0 root 1 20 0 4668K 1656K ttyin 3 0:00 0.00% tip > 40454 0 root 1 24 0 4636K 1780K ttyin 3 0:00 0.00% getty > > I'll let it go for a while to see if poudriere notices it's failed and cleans up. > > At the moment /boot/loader.conf contains > > # Configure USB OTG; see usb_template(4). > hw.usb.template=3 > umodem_load="YES" > # Disable the beastie menu and color > beastie_disable="YES" > loader_color="NO" > vm.pageout_oom_seq="4096" > vm.pfault_oom_attempts="3" > vm.pfault_oom_attempts="120" 2 assignments to the same thing in a row? The 2nd ends up controlling the value. > vm.pfault_oom_wait="20" So you are allowing it 120 * 20 sec == 2400 sec (in other words, 40 minutes of retrying every 20 seconds) to handle a page fault. That time scale may have contributed to why it failed first for "swap blk zone exhausted" instead of more usual types of OOM cause: How many page faults had active 40 minute intervals at the time? You may be just moving around where a problem shows up, not leading to lack of a failure overall. > kern.cam.boot_delay="20000" > vfs.ffs.dotrimcons="1" > vfs.root_mount_always_wait="1" > filemon_load="YES" > > /usr/local/etc/poudriere.conf contains > USE_TMPFS=no > NOHANG_TIME=28800 > MAX_EXECUTION_TIME_EXTRACT=14400 > MAX_EXECUTION_TIME_INSTALL=14400 > MAX_EXECUTION_TIME_PACKAGE=432000 > ALLOW_MAKE_JOBS=yes > MAX_JOBS_NUMBER=2 I do not remember there being a MAX_JOBS_NUMBER in the infrastructure. So I will ignore that line. It probably should be deleted. > MAKE_JOBS_NUMBER=2 > > Do these settings look reasonable? ALLOW_MAKE_JOBS/MAX_JOBS_NUMBER is not independent of what is being built. There is no global, single answer to "looks reasonable" for them. However, MAX_JOBS_NUMBER is in the wrong file. It is from/for make, not from/for poudriere directly. (But there is a way for poudriere to contribute such to make.) For example (from a grep): /usr/local/etc/poudriere.d/make.conf:MAKE_JOBS_NUMBER=2 ( MAKE_JOBS_NUMBER_LIMIT is the same for where it goes. ) You might need to use MAX_JOBS_NUMBER=1 or to not assign to ALLOW_MAKE_JOBS to have a chance to have the devel/llvm15 build fit if you have already turned off options that avoid using resources for building what you do not need. === Mark Millard marklmi at yahoo.com