Re: More swap trouble with armv7, was Re: -current on armv7 stuck with flashing disk light
Date: Tue, 04 Jul 2023 21:51:24 UTC
[I continued to type MAX_JOBS_NUMBER where MAKE_JOBS_NUMBER should have been what I typed.] On Jul 4, 2023, at 14:22, Mark Millard <marklmi@yahoo.com> wrote: > On Jul 4, 2023, at 12:07, bob prohaska <fbsd@www.zefox.net> wrote: > >> On Tue, Jun 27, 2023 at 10:16:57AM -0700, bob prohaska wrote: >>> On Tue, Jun 27, 2023 at 09:59:40AM -0700, Mark Millard wrote: >>>>> >>>>> If you want to identify system hangs, please >>>>> put back: >>>>> >>>>> vm.swap_enabled=0 >>>>> vm.swap_idle_enabled=0 >>>>> >>> >>> They're reinstated now, but I don't want to disturb the system >>> while it seems to be building world acceptably. >>> >> Reinstating >> vm.swap_enabled=0 >> vm.swap_idle_enabled=0 >> >> and limiting buildworld to -j3 allows buildworld to complete successfully in 1 GB of swap. >> >> Meanwhile, attempts to compile sysutils/usbtop using poudriere still cause swap exhaustion >> while compiling /devel/llvm15 even with 2 GB of swap allocated. > > What sort of parallelism settings in poudriere for the > devel/llvm15 build attempt? Have you tried allowing > less parallelism (if there is a less for what you have > tried)? > > What options are enabled vs. disabled for devel/llvm15 ? > > BE_STANDARD vs. BE_FREEBSD vs. BE_NATIVE ? > > BE_NATIVE probably help limit resource use the most if it > happens to be sufficient. BE_FREEBSD would be in the > middle of the 3 options for this issue. > > Is MLIR enabled? If having it disabled is sufficient, it > being disabled should help avoid as much resource use. > Simiarly for FLANG. (Building FLANG requires MLIR, so > having MLIR disabled implies FLANG needing to also be > disabled.) > >> The messages are >> Jul 4 11:18:48 www kernel: pid 1074 (getty), jid 0, uid 0, was killed: out of swap space > > In my view the "out of swap space" is still a misleading > misnomer for this context, but at least the following > messages are more specific to the actual internal > data-structure(s) problem(s). My understanding is that > the data structures can have fragmentation issues. > > For fragmentation issues, prior history since booting > might contribute, and building just after a reboot may > end up with less fragmentation. (Unknown if sufficiently > less.) > > Also, over allocating the swap partition (by not having > kern.maxswzone appropriately matching) likely makes > "swap blk zone exhausted" more likely. It is one of the > reasons I avoid using swap partitioning with a total > size that generates the message about possible > mistuning. > >> swap blk zone exhausted, increase kern.maxswzone > > Have you ever gotten the above line before? I was > unaware of any examples of it showing up. > >> swblk zone ok > > I'll note that there is another potential message > pair for "swap pctrie zone exhausted"/"swpctrie zone ok" > that you have not reported getting. > > Have you ever seen the "swap pctrie zone exhausted" > notice? (Just curiosity on my part.) > >> IIRC the "increase kern.maxswzone" is unhelpful, if not impossible. The >> "swblk zone ok" seems new. > > Are you using the default kern.maxswzone for your context? > What is its value? > > Did you get the notice about possible mistuning for your > combination of swap partition sizing and kern.maxswzone > value? Or did "swap blk zone" happen even without that > notice happening? > >> From the gstat output near peak swap use the system wasn't I/O bound, > > The "swap blk zone" contains an in-kernel-RAM data > structure that is involved in managing the swap space > usage. > >> the disk was less than 25% busy at the time of the first OOMA kill. > > "swap blk zone" can end up with fragmentation issues, where > the total available is only made up of a bunch of tiny chunks > and nothing large can be handled as a unit any more. (A general > description of "fragmented".) > >> Eventually it was possible to log in on the serial console and run top: >> >> 33 processes: 1 running, 29 sleeping, 3 zombie >> CPU: 0.0% user, 0.0% nice, 10.6% system, 0.2% interrupt, 89.2% idle >> Mem: 139M Active, 8256K Inact, 252M Laundry, 221M Wired, 98M Buf, 292M Free >> Swap: 2048M Total, 1291M Used, 756M Free, 63% Inuse >> >> PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 40719 0 root 1 20 -20 0B 8192B swzonx 0 0:12 9.15% cron >> 40717 0 root 1 20 -20 0B 8192B swzonx 0 0:34 9.08% sh >> 40709 0 root 1 20 -20 0B 8192B swzonx 0 0:38 9.01% sshd >> 40720 0 root 1 20 -20 0B 8192B swzonx 3 0:13 7.47% sh > > Unfortunately the swzonx text is truncated. There is > actually: > > pause("swzonxb", 10); for swblk zone > and: > pause("swzonxp", 10); for swap pctrie zone > > top's display leaves it unclear which was involved. > >> 40721 0 bob 1 20 0 6608K 2600K CPU1 1 0:00 0.32% top >> 25761 0 bob 1 20 0 14M 6136K select 0 0:02 0.03% sshd >> 25852 0 root 1 20 0 4668K 1648K ttyin 1 0:01 0.03% tip >> 1237 0 root 1 20 0 5820K 1540K wait 1 0:12 0.00% sh >> 25381 0 root 1 23 0 14M 5868K select 1 0:01 0.00% sshd >> 1030 0 root 1 24 0 13M 2416K vmbckw 1 0:00 0.00% sshd >> 12715 0 root 1 68 0 5820K 1660K wait 0 0:00 0.00% sh >> 12710 0 root 1 20 0 5820K 1556K piperd 1 0:00 0.00% sh >> 929 0 root 1 20 0 5356K 1256K select 3 0:00 0.00% syslogd >> 1014 0 root 1 20 0 5124K 1356K nanslp 2 0:00 0.00% cron >> 25770 0 bob 1 36 0 6844K 3116K pause 1 0:00 0.00% tcsh >> 25794 0 bob 1 24 0 5380K 2188K wait 2 0:00 0.00% su >> 39626 0 root 1 20 0 5424K 2404K wait 2 0:00 0.00% login >> 40635 0 bob 1 20 0 6824K 3272K pause 1 0:00 0.00% tcsh >> 25820 0 root 1 21 0 5608K 2204K wait 0 0:00 0.00% sh >> 25851 0 root 1 20 0 4668K 1656K ttyin 3 0:00 0.00% tip >> 40454 0 root 1 24 0 4636K 1780K ttyin 3 0:00 0.00% getty >> >> I'll let it go for a while to see if poudriere notices it's failed and cleans up. >> >> At the moment /boot/loader.conf contains >> >> # Configure USB OTG; see usb_template(4). >> hw.usb.template=3 >> umodem_load="YES" >> # Disable the beastie menu and color >> beastie_disable="YES" >> loader_color="NO" >> vm.pageout_oom_seq="4096" >> vm.pfault_oom_attempts="3" >> vm.pfault_oom_attempts="120" > > 2 assignments to the same thing in a row? > The 2nd ends up controlling the value. > >> vm.pfault_oom_wait="20" > > So you are allowing it 120 * 20 sec == 2400 sec > (in other words, 40 minutes of retrying every 20 > seconds) to handle a page fault. > > That time scale may have contributed to why it > failed first for "swap blk zone exhausted" > instead of more usual types of OOM cause: > How many page faults had active 40 minute > intervals at the time? > > You may be just moving around where a problem > shows up, not leading to lack of a failure > overall. > >> kern.cam.boot_delay="20000" >> vfs.ffs.dotrimcons="1" >> vfs.root_mount_always_wait="1" >> filemon_load="YES" >> >> /usr/local/etc/poudriere.conf contains >> USE_TMPFS=no >> NOHANG_TIME=28800 >> MAX_EXECUTION_TIME_EXTRACT=14400 >> MAX_EXECUTION_TIME_INSTALL=14400 >> MAX_EXECUTION_TIME_PACKAGE=432000 >> ALLOW_MAKE_JOBS=yes >> MAX_JOBS_NUMBER=2 > > I do not remember there being a MAX_JOBS_NUMBER in > the infrastructure. So I will ignore that line. It > probably should be deleted. > >> MAKE_JOBS_NUMBER=2 >> >> Do these settings look reasonable? > > ALLOW_MAKE_JOBS/MAX_JOBS_NUMBER is not independent > of what is being built. There is no global, single > answer to "looks reasonable" for them. Sorry: ALLOW_MAKE_JOBS/MAKE_JOBS_NUMBER > However, MAX_JOBS_NUMBER is in the wrong file. Sorry: MAKE_JOBS_NUMBER > It is from/for make, not from/for poudriere > directly. (But there is a way for poudriere > to contribute such to make.) > > For example (from a grep): > > /usr/local/etc/poudriere.d/make.conf:MAKE_JOBS_NUMBER=2 > > ( MAKE_JOBS_NUMBER_LIMIT is the same for where it > goes. ) > > You might need to use MAX_JOBS_NUMBER=1 or Sorry yet again: MAKE_JOBS_NUMBER > to not assign to ALLOW_MAKE_JOBS to have a > chance to have the devel/llvm15 build fit > if you have already turned off options that > avoid using resources for building what you > do not need. > === Mark Millard marklmi at yahoo.com