From nobody Tue Jul 04 21:51:24 2023 X-Original-To: freebsd-ports@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Qwc283t9Gz4lqXB for ; Tue, 4 Jul 2023 21:51:44 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic301-22.consmr.mail.gq1.yahoo.com (sonic301-22.consmr.mail.gq1.yahoo.com [98.137.64.148]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Qwc273rbJz46Z1 for ; Tue, 4 Jul 2023 21:51:43 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=Y20gqBVb; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.148 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1688507501; bh=SF2MJjdTFy+1/wE7brQmZBKtKY+GdOaBONvzsL3KRuU=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=Y20gqBVbBPs0WL+i3/7nuGtOTtLJb6wii+XyFdhYDWIjUCde72/3J6ntGsjXYSkLD50QHeFAuIJXJsI+E+rsbAR9mBO9jqt0Q+KCKR9lOMpZ31ooqwxdF1eIugVkvX+9QRaK9easvJFGtqFbZSO9JmLR/BA3LTVpsbZEkd0diZ6OLPe9NW7YcET8ayzNCXxtH1Sfdx+wt6pmNsfzwQkuMumRRx3tw3W8Fd0JhY437daIgDVuzeSogLPNAmTKzGxEO9f5PasGsrp89VakbPUGtYkygr1yBgSZOQSG4P1zRmoYKZUQGpgrPmD0uBzk7UuqgTxbhnGVIfWUaUT5WVJr2w== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1688507501; bh=h+Tldo9/Qk5aXAcyRmiYrGS8WSZlfC8MEt4v9lrELpi=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=n9eE7gCkKRKwlucdMm3ec2QFY4kjBbgDLRVo4wKfvsKfvxd60IrAXjScUKmJVbK7GHIXxCvR1YppucZtfbdpersLbfQCyt+hbZpz3YqlM5X2Iowdd+z/CdnAWPyPk9ZC0DJ5yi5lNKyVHHs8JMwdTadGPbrhCGq0VaE4cymAc1p3lH89lZZwEr3yY9nFsBq3HUXRDwW+kBarAQMQblnPp03f2d4tclJf6GckWPLXtk78DHweKgjbzwb6fzFHI7J5ftryohLRH2Dj+rbfxsxl6jcN5P1ucXwai2wrbi6rvzQYLNhw9h2CCwPyIT3RxoWSibhlKzSgSp7NHmh3q0+xqQ== X-YMail-OSG: UJZ74XEVM1n_iIov5isE1mOxUAMAcZ05h.Z8V9frnX.7EHukNmDq33timGHzr00 zTbl8FcZ8h_JGWTR_H5v.1Az9EdCVmZSPxFBQDtn77avLNMMxerJJ7B5Im_iVhN2EhRyCgeMkweR zN1OBjwsyfx0dGKvRJCD2hUIBSXHY_kIc79JzgEofRbtMN03bdtgXDBwttH7neZq8jBqWpHtiSkn Y3GMlybp_MfFTatEUQHXZsCuLNvc4JsIeVIuOOJCPcSQ.13Ic9.N57E_UxNPdT.9kgm9KqTJyRzD sNtKkMi3nfDUWG61cO1W0EZ.iZuTNlpelGGI.SBjG.sjLpjRZhToiWAy3Eis48dbDJHf8O70usrI Cf.qfsVHrHTsJa0qq4szdp7S.1d0MuuvJylajULQJK2JmHiLNQVAgz_D3nFR7Yyo8NsDkKBAy42_ m_COSrOI_9mSNrtQQhaoTyY_BYFNRvMKabIrVrfSEG7ZvEny4u2U7OaJx2DUixGIDMSeESKWRBVV uBR5bgc1yEnyQwGRApmJBAVGvFejFNIAw4Qth8sgQezlI_Ebc8jGWuMfKQ5_zhHdoTvRBINfHIXZ Vgh1pnczYxTcSYM3FnDO8GSj0bLpo2jV1HPj0fFw9_J8jvxGNQyA.srQu_MjylFfxHBmqmQtU96q DKyQmvDI5AIX3C90mcA2lQM.GzGYn8ecuB59Ec5JmkH2TPh1RWMHDgukXmi8F3VS4oGvGz31IfHj F9aD7lAj76RQR.ZZwqou8_EzhZFWv2ZJInu.ABWWm.10PFhtmbIaVnyIEM8zuQnCpnDlrfRO8C.B kWZ6BtnX9PfWar.MV0LyaIMmxIdWfRvy84QC8sf2U0_Oa_vSbO5tk1cK8yV2Mu3RuyosY2BAPVyA dL9.x9tQwiTGa7jOvj5bvhG6Zn.HYnvhw3w8b_930Tp3SnI0hItPVMNtHkbGMpwHMbGUXXeflnMZ 31.HItRfC2NA2XFPmdhCe_VZChLZO7aW_qERX1xVTGfntXVc2xMVU2G7LqKmReds3KOmyhvDni87 CMzI3k6NIeB7tccBpDvHuNE_Qo_yX3Cv9JmptAJ5yPLuI5YGMfZSCP6tTUrZ2N0GOaT5mb5I_tyc aK9iVJJcHfG33IR5dpVls59YIm9zdOycpIiCG4euCGIy6zTY2MRY7GxWoqswBrvVncuaMZmybtCI X8niGctDliIIClkMZu4nJ7pelsS5zogGdhV35lfjCY.kmejYsOXu6PFuGHBbj1uOHdGo20PqohJB Tyou3L3w_k7FC63SHCwY7p544gadp8Jy32HZ8FOGPdZdXybbFhlKnBG2mp8VvFKqxyMWJif6.5ZS ti8lg16OwA1pPMsLYqaYCa4MsWhe.EV3XN4ld_7ATFTuWiqn8KvpYcuxjRZKSJp7crNWzOEmCrvM YgbjGzPxZS28ZQtGceETxxL998e7WwGszQyx65d40neYKUmGc2a3.t.NwSo7..IO6hCpaHPvGN_4 bMZxJCFrJVkdsGgolxq5nfc7FmhFDBg2FpIFMrMvymnKAkDo6jDufumTyxtiE7Tw0KmbiFSoJdaD uRSe.Q9xlRGkOTA3gGA3qUU4HnpuIFx9ARD2kVK3xyoE.sRbA91O5BQ5VtttvDME8Kv1tg0I8mGR YRiijWyItWCHhLggtH1DZHydN8KkUKnuDK0WxMC8KRZhxZd25melf15Go3c0qdSx4dviFkb3GSgW DrGNeCpkGtNWa0bGO4fOUFvfs1fsYjZlWLerkYOSwBTtbnhHo0niZ4HXBXdoPQENwOmXgKrC08gF 496Trlc2S1oP4UtujFQp4TSI6y1r_GcCbgfaUwX3T5BOOc8LvJ8VOcmXQWMaCR9SP_DtjfwSLGWc j6qNYhS_sQ3QKHUK.nxsrg79BsI5YEMC346cjZuWOuCDt_wOjxnUw83I_0kyyAzOuy2I_g6pgr1p DdroynkrKBXtgAafgUD8YwHQevtvZm4jSzlkhYryHuzWw3LfbqzFjPwYkHCDnxg3o_gS9s38NbJw ALOzt8fyVwZP8cobirTnhSJO5zUQQAn8SIjuwVD1oXKSws.z5WQUsTVsyOBH7qlfyLYFLHx7HwoP cguBrbbHn3usp6rObHFtOL9X3TbwdVXq3L8FDwaIXsbbcUyMxlYfb.RhAtoGTYxOCEcyPnhr0IT9 B.Xubr0uYeia3LAa5xJuIwt_V3YGpHXdUthFxTT_0s5Q9L5JjRA4h8xx5.Fx.ACMvysvzzDUUxYJ tAM5HPPfPDsMLlejsHQZydxIFJYUewd9Iyg1k2002OO90Wf3_Ivv8Jp33p9Gn9oljCdyjab5kdIT R X-Sonic-MF: X-Sonic-ID: 1affcfba-75ed-4771-b75c-2798a332f617 Received: from sonic.gate.mail.ne1.yahoo.com by sonic301.consmr.mail.gq1.yahoo.com with HTTP; Tue, 4 Jul 2023 21:51:41 +0000 Received: by hermes--production-bf1-5d96b4b9f-jv67c (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID fda7e636bf940e2d4b1494a7d113106d; Tue, 04 Jul 2023 21:51:36 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Porting software to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-ports List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-ports@freebsd.org X-BeenThere: freebsd-ports@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Subject: Re: More swap trouble with armv7, was Re: -current on armv7 stuck with flashing disk light From: Mark Millard In-Reply-To: <9A15D619-3274-44AC-B7E1-A1D6C7D334F2@yahoo.com> Date: Tue, 4 Jul 2023 14:51:24 -0700 Cc: FreeBSD Mailing List , freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <286ABDA5-BB1A-47C1-A187-168FFD86A441@yahoo.com> References: <066FD282-1637-448C-99FF-BA62718386F0@yahoo.com> <9A15D619-3274-44AC-B7E1-A1D6C7D334F2@yahoo.com> To: bob prohaska X-Mailer: Apple Mail (2.3731.600.7) X-Spamd-Result: default: False [-0.75 / 15.00]; NEURAL_HAM_LONG(-0.99)[-0.994]; MV_CASE(0.50)[]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_SPAM_MEDIUM(0.44)[0.438]; NEURAL_SPAM_SHORT(0.31)[0.306]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MIME_GOOD(-0.10)[text/plain]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; RCVD_VIA_SMTP_AUTH(0.00)[]; BLOCKLISTDE_FAIL(0.00)[98.137.64.148:server fail]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.148:from]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-ports@freebsd.org]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[yahoo.com:+]; FREEMAIL_FROM(0.00)[yahoo.com]; TO_DN_SOME(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.64.148:from] X-Rspamd-Queue-Id: 4Qwc273rbJz46Z1 X-Spamd-Bar: / X-ThisMailContainsUnwantedMimeParts: N [I continued to type MAX_JOBS_NUMBER where MAKE_JOBS_NUMBER should have been what I typed.] On Jul 4, 2023, at 14:22, Mark Millard wrote: > On Jul 4, 2023, at 12:07, bob prohaska wrote: >=20 >> On Tue, Jun 27, 2023 at 10:16:57AM -0700, bob prohaska wrote: >>> On Tue, Jun 27, 2023 at 09:59:40AM -0700, Mark Millard wrote: >>>>>=20 >>>>> If you want to identify system hangs, please >>>>> put back: >>>>>=20 >>>>> vm.swap_enabled=3D0 >>>>> vm.swap_idle_enabled=3D0 >>>>>=20 >>>=20 >>> They're reinstated now, but I don't want to disturb the system >>> while it seems to be building world acceptably.=20 >>>=20 >> Reinstating=20 >> vm.swap_enabled=3D0 >> vm.swap_idle_enabled=3D0 >>=20 >> and limiting buildworld to -j3 allows buildworld to complete = successfully in 1 GB of swap. >>=20 >> Meanwhile, attempts to compile sysutils/usbtop using poudriere still = cause swap exhaustion >> while compiling /devel/llvm15 even with 2 GB of swap allocated.=20 >=20 > What sort of parallelism settings in poudriere for the > devel/llvm15 build attempt? Have you tried allowing > less parallelism (if there is a less for what you have > tried)? >=20 > What options are enabled vs. disabled for devel/llvm15 ? >=20 > BE_STANDARD vs. BE_FREEBSD vs. BE_NATIVE ? >=20 > BE_NATIVE probably help limit resource use the most if it > happens to be sufficient. BE_FREEBSD would be in the > middle of the 3 options for this issue. >=20 > Is MLIR enabled? If having it disabled is sufficient, it > being disabled should help avoid as much resource use. > Simiarly for FLANG. (Building FLANG requires MLIR, so > having MLIR disabled implies FLANG needing to also be > disabled.) >=20 >> The messages are >> Jul 4 11:18:48 www kernel: pid 1074 (getty), jid 0, uid 0, was = killed: out of swap space >=20 > In my view the "out of swap space" is still a misleading > misnomer for this context, but at least the following > messages are more specific to the actual internal > data-structure(s) problem(s). My understanding is that > the data structures can have fragmentation issues. >=20 > For fragmentation issues, prior history since booting > might contribute, and building just after a reboot may > end up with less fragmentation. (Unknown if sufficiently > less.) >=20 > Also, over allocating the swap partition (by not having > kern.maxswzone appropriately matching) likely makes > "swap blk zone exhausted" more likely. It is one of the > reasons I avoid using swap partitioning with a total > size that generates the message about possible > mistuning. >=20 >> swap blk zone exhausted, increase kern.maxswzone >=20 > Have you ever gotten the above line before? I was > unaware of any examples of it showing up. >=20 >> swblk zone ok >=20 > I'll note that there is another potential message > pair for "swap pctrie zone exhausted"/"swpctrie zone ok" > that you have not reported getting. >=20 > Have you ever seen the "swap pctrie zone exhausted" > notice? (Just curiosity on my part.) >=20 >> IIRC the "increase kern.maxswzone" is unhelpful, if not impossible. = The >> "swblk zone ok" seems new.=20 >=20 > Are you using the default kern.maxswzone for your context? > What is its value? >=20 > Did you get the notice about possible mistuning for your > combination of swap partition sizing and kern.maxswzone > value? Or did "swap blk zone" happen even without that > notice happening? >=20 >> =46rom the gstat output near peak swap use the system wasn't I/O = bound, >=20 > The "swap blk zone" contains an in-kernel-RAM data > structure that is involved in managing the swap space > usage. >=20 >> the disk was less than 25% busy at the time of the first OOMA kill. >=20 > "swap blk zone" can end up with fragmentation issues, where > the total available is only made up of a bunch of tiny chunks > and nothing large can be handled as a unit any more. (A general > description of "fragmented".) >=20 >> Eventually it was possible to log in on the serial console and run = top: >>=20 >> 33 processes: 1 running, 29 sleeping, 3 zombie >> CPU: 0.0% user, 0.0% nice, 10.6% system, 0.2% interrupt, 89.2% = idle >> Mem: 139M Active, 8256K Inact, 252M Laundry, 221M Wired, 98M Buf, = 292M Free >> Swap: 2048M Total, 1291M Used, 756M Free, 63% Inuse >>=20 >> PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME = WCPU COMMAND >> 40719 0 root 1 20 -20 0B 8192B swzonx 0 0:12 = 9.15% cron >> 40717 0 root 1 20 -20 0B 8192B swzonx 0 0:34 = 9.08% sh >> 40709 0 root 1 20 -20 0B 8192B swzonx 0 0:38 = 9.01% sshd >> 40720 0 root 1 20 -20 0B 8192B swzonx 3 0:13 = 7.47% sh >=20 > Unfortunately the swzonx text is truncated. There is > actually: >=20 > pause("swzonxb", 10); for swblk zone > and: > pause("swzonxp", 10); for swap pctrie zone >=20 > top's display leaves it unclear which was involved. >=20 >> 40721 0 bob 1 20 0 6608K 2600K CPU1 1 0:00 = 0.32% top >> 25761 0 bob 1 20 0 14M 6136K select 0 0:02 = 0.03% sshd >> 25852 0 root 1 20 0 4668K 1648K ttyin 1 0:01 = 0.03% tip >> 1237 0 root 1 20 0 5820K 1540K wait 1 0:12 = 0.00% sh >> 25381 0 root 1 23 0 14M 5868K select 1 0:01 = 0.00% sshd >> 1030 0 root 1 24 0 13M 2416K vmbckw 1 0:00 = 0.00% sshd >> 12715 0 root 1 68 0 5820K 1660K wait 0 0:00 = 0.00% sh >> 12710 0 root 1 20 0 5820K 1556K piperd 1 0:00 = 0.00% sh >> 929 0 root 1 20 0 5356K 1256K select 3 0:00 = 0.00% syslogd >> 1014 0 root 1 20 0 5124K 1356K nanslp 2 0:00 = 0.00% cron >> 25770 0 bob 1 36 0 6844K 3116K pause 1 0:00 = 0.00% tcsh >> 25794 0 bob 1 24 0 5380K 2188K wait 2 0:00 = 0.00% su >> 39626 0 root 1 20 0 5424K 2404K wait 2 0:00 = 0.00% login >> 40635 0 bob 1 20 0 6824K 3272K pause 1 0:00 = 0.00% tcsh >> 25820 0 root 1 21 0 5608K 2204K wait 0 0:00 = 0.00% sh >> 25851 0 root 1 20 0 4668K 1656K ttyin 3 0:00 = 0.00% tip >> 40454 0 root 1 24 0 4636K 1780K ttyin 3 0:00 = 0.00% getty >>=20 >> I'll let it go for a while to see if poudriere notices it's failed = and cleans up. >>=20 >> At the moment /boot/loader.conf contains >>=20 >> # Configure USB OTG; see usb_template(4). >> hw.usb.template=3D3 >> umodem_load=3D"YES" >> # Disable the beastie menu and color >> beastie_disable=3D"YES" >> loader_color=3D"NO" >> vm.pageout_oom_seq=3D"4096" >> vm.pfault_oom_attempts=3D"3" >> vm.pfault_oom_attempts=3D"120" >=20 > 2 assignments to the same thing in a row? > The 2nd ends up controlling the value. >=20 >> vm.pfault_oom_wait=3D"20" >=20 > So you are allowing it 120 * 20 sec =3D=3D 2400 sec > (in other words, 40 minutes of retrying every 20 > seconds) to handle a page fault. >=20 > That time scale may have contributed to why it > failed first for "swap blk zone exhausted" > instead of more usual types of OOM cause: > How many page faults had active 40 minute > intervals at the time? >=20 > You may be just moving around where a problem > shows up, not leading to lack of a failure > overall. >=20 >> kern.cam.boot_delay=3D"20000" >> vfs.ffs.dotrimcons=3D"1" >> vfs.root_mount_always_wait=3D"1" >> filemon_load=3D"YES" >>=20 >> /usr/local/etc/poudriere.conf contains >> USE_TMPFS=3Dno >> NOHANG_TIME=3D28800 >> MAX_EXECUTION_TIME_EXTRACT=3D14400 >> MAX_EXECUTION_TIME_INSTALL=3D14400 >> MAX_EXECUTION_TIME_PACKAGE=3D432000 >> ALLOW_MAKE_JOBS=3Dyes >> MAX_JOBS_NUMBER=3D2 >=20 > I do not remember there being a MAX_JOBS_NUMBER in > the infrastructure. So I will ignore that line. It > probably should be deleted. >=20 >> MAKE_JOBS_NUMBER=3D2 >>=20 >> Do these settings look reasonable? >=20 > ALLOW_MAKE_JOBS/MAX_JOBS_NUMBER is not independent > of what is being built. There is no global, single > answer to "looks reasonable" for them. Sorry: ALLOW_MAKE_JOBS/MAKE_JOBS_NUMBER > However, MAX_JOBS_NUMBER is in the wrong file. Sorry: MAKE_JOBS_NUMBER > It is from/for make, not from/for poudriere > directly. (But there is a way for poudriere > to contribute such to make.) >=20 > For example (from a grep): >=20 > /usr/local/etc/poudriere.d/make.conf:MAKE_JOBS_NUMBER=3D2 >=20 > ( MAKE_JOBS_NUMBER_LIMIT is the same for where it > goes. ) >=20 > You might need to use MAX_JOBS_NUMBER=3D1 or Sorry yet again: MAKE_JOBS_NUMBER > to not assign to ALLOW_MAKE_JOBS to have a > chance to have the devel/llvm15 build fit > if you have already turned off options that > avoid using resources for building what you > do not need. >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com