From nobody Tue May 02 10:20:03 2023
X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q9bgT6MW2z48nTJ
	for <freebsd-arm@mlmmj.nyi.freebsd.org>; Tue,  2 May 2023 10:20:21 +0000 (UTC)
	(envelope-from marklmi@yahoo.com)
Received: from sonic304-23.consmr.mail.gq1.yahoo.com (sonic304-23.consmr.mail.gq1.yahoo.com [98.137.68.204])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4Q9bgT3Mhbz3PkX
	for <freebsd-arm@FreeBSD.org>; Tue,  2 May 2023 10:20:21 +0000 (UTC)
	(envelope-from marklmi@yahoo.com)
Authentication-Results: mx1.freebsd.org;
	none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1683022819; bh=BBEJCJ6v4vURwPvre40ZqZxjmk5TcliON1bRJ2fh1UA=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=dV2NAygXenQ1biYqlAHi/ciyoqckmUbW+P6v45ClyA7YIzHvhXhJdSnEhTYOCLc53Csnr3k/uat5EpVDMTWzzE6u/x0oldHRzMh2mDGPxdXh9BuEilFjmPfn/lJn07E+OVUQIuJsYRNPV0p08XspSjiyGcZVmP144NRTbMwkqAd1CesIaAsPktgf3RAgEH67o4bRf9FQw1y9naj9U9F6T5GLaLBXjxt512ZtnNONxZ5uLuD01KVG/SNSvhKIpQIfYQoQBtDlsMtzVSWvWhNelqfkecxQDfEsFOb0kSUiijgiu9me9owd67cYKX+8sI50yu6MGBmfbPUSu6unhRR+pQ==
X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1683022819; bh=sVoo2Wkx80m84tRUpiBVcX2B5sp4al+CjSE5k4/9n1I=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=uBuSwv8/8kpJc0fgbCHjyy8QYsyEBadykEYWN/6c0krdPz8HNy0oA8gAu21k2juvuH5GlReRvDbj2o9PIc83e8gucuzDbVY/kK1o2zdvHJm5FgldL8PIvvn99SUyeuKjiQZtS31y6Upk1mlKK3YHkJNn0ky0S7GjzHdY1rRabHLSimiqu3SXSO9eO6mbExYC1QVpuijd1qwwicZ8J4l/xr7Wonomw4aKEJm5JViAcV4UHCUXpDrwR8UOPzgkVr8RiwdNMWxjmtc2pmq0QOxgKnH8D1hjZ4tczyYDWjkq4evEjDlnfLA/d+qr4+WNJGa61xINAW+SdsEiY0j5LQe3kQ==
X-YMail-OSG: XCkY7P0VM1n4cWZQF6rktSbnxE2gF6SbPhyi6TXKsE2hudXJiCMS7O0Mi7Wkndb
 xZMeYF.AluH72GUnVIzu1HGJ1DraPTi08VyopxA7tVH85k1jZelEe.SUCVlqLjPcls_hDjxQmBVv
 Ateju5_.LLI78TbXuWKHJbmXd09s4FA0HfJwOBMAnO5kxw_AmWTJQc7xk4nfDuazNLtLqzlr5vZy
 ejwRBPXjBpQWpn2JUJR0VwgTsM8zQjbXTuoSIJSrV.GC.R4FmU9s2crGt8.WkTjlrPTgmAzyKFoO
 oAceRLnmmGuUyhhTX2.faQAQP21LyS3Eu7wDsq9xspP18kDAOzw6lMCFibcTkqMCxC2E95Xc8Pun
 S568prl74JRbwl1w2vl8kT.Rc3fXj8NcFhnRB65VZSNsGCeSmf8sIjzvm.m4HlS_HtfcPc._w6Z2
 Q8fdJ7S692E6JVYTN.32kpfOPPHdM38RlovLXFW1WpkJCv0WtWpkU.ZBYmmFBJpGIRVtWt0O4yPk
 z3uLwmqh3A39it.P90uOgwJxs3ehmzC5QfrcQmxJsLVKm4JCLwmfGmLWIwTB5MKu45iPTVeoiOyZ
 lzZS8I2JHbXy3Bb_WNTKZXezquwIRGHzdp9D.esKfKcwqAAQ0NCthNsCcPSS.pu7CVoNv.LHCwjq
 cj5jobOuUZOyqWCYTY76y.ABCga64YOKBpO_iIzT5m3jsooBfYnZ97LMqT.EZ4nOorKP9CFq1KUS
 qo5Ifyc6RQlSgAFYOf5DrnOMpFh7cCvIorT1Wm6of0T1PmRzncGN18HBCOAWwZTv6YYZcji7GBsH
 nuu1YHt1sOJAIwx7DJuV.AWfshAdXphYnLG8zU8ABRul7DTOrktGp5yn1f36DMPvjgaF3Xi9MNXf
 AWo4XtZu7QW6uBraghfjaWjusUw.pAoEbXdSnAeXS79r1qBhDX_FixHXgm1nLjSkt8hyBHUyMaY5
 O5ULs1ADBN0S5aFTXKrjvhwK3zjGLWKKtVc_3EHmOunj0ne1.kTmcJTXNGkR2N7S941llgAHunWL
 mSwnPw8Nik3LJ2ov72jnn8YUEeoVC83DSTYoejKnQV4gzH6uaeMMMu1ABKG8vjgFyOvJkIQG3XJ1
 Gjn5QztrIIHzlE26yINxrDSAjUqs8eacgk0e2YuOGy4isK5AVdJqWiKOiIsOURnssCZ_SZG8g2iq
 43QZs86_Rrao_hdSXDH2igsVxSS2SYOMjMicjBwt5y2F85v8Y.vdzyJGtTx5UIEWRTk54S6P9jv6
 ADTXNMSEsQvnbecXhW7MTEzcVfjL7DOcqZnxnBf9brlDt632K1xnroKMk3CoxXs.RFyhblVOwHOn
 mKN0zp_wtefF1B_q29nwjUMBFQmPRM8UQUg0rVJHcxSRZhPGSCgyKyRpa6T.Hz6YBbxuqZPy0eCl
 XhTvMCR2Bwq7C0eNCC8IS_Q5pg3K8PvEnweBsVPb_UkV_Xb8CBTkYeZelvL7Gbaa3zGxaTakKW6e
 _jIQKuP51ujhpPCq5DKfSrs3Y9FbeSit0JCZp8r16fDPXt3SooG3cnhcat_vkFtx24qavnoA0w.j
 Kxq1h7GJ.Om7WXoVM1AuYaMPTQ0d3pYG.yhIotA3rGjoc1FTwDJduKyULGh7dtMrIQ3XeNczfv3O
 _crS6FhXA6g2yA8s5Vxbi0tCbRqsbdZJptm_H0l8DvOr0OBhpPuEFdvXdLwFEVzewIQfuvFc03XL
 9ahE3cA0pYx68zo7LCybawe1lXJlah0fmd86PysNPx3ev51yaE9NNGRGKSKeQEz3GEJ_cgVaizhN
 mHUg9Mz097OKeejmXIHmMUE.OZ3lH4nNeOQAVIHbf7kCjHBdmlISmXDnx18BJZSyoN9orFGqt.Fp
 LcYEsNnA7euo9OsYMLgkZ2kdj1jscndRa_DpBCA4rWFc9e40zpGKOJHmCIgp3nkYVJBmUfYVuo59
 FdE5bsfyro2WHzJq5vD63Brl61oDSVzPUsS2GWz5jE.lS3Z9Hbg2.DF.ArOLfQ.uV_7Ff7tp4_61
 JgGgUXvwLtFWT55D_VEubOrlMZJX3F3i5dR_9RkXVdsFHb2GfQVK1vEmjL2_OLVB2BUyk5bZku0k
 p5brDycWsQHMDd61et68H555HGhNr9nwO0_q.caVbUojpIrwgKjvZ5IIwaibB.d5Gn5.Y6AcIYiQ
 wYCzluD.VZJ3ypcCQeIQJH3YfsAo5_ZQcUmTxfvwpukFTkfHf_S9GGzYR529Y2.naE0FIY90rIll
 Mj2_q_I_Bw6qG5pXLPKCrIRJY20ITxVIpcdy1U1cXThUGv2sT_126nKK_CnlTn7L8f1d9158VQC.
 iHw--
X-Sonic-MF: <marklmi@yahoo.com>
X-Sonic-ID: 57b05a52-eda2-400a-80aa-c1d52d6d93f3
Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.gq1.yahoo.com with HTTP; Tue, 2 May 2023 10:20:19 +0000
Received: by hermes--production-bf1-5f9df5c5c4-qlh82 (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 1a60e3a5694e2f07320691134634f8a4;
          Tue, 02 May 2023 10:20:15 +0000 (UTC)
Content-Type: text/plain;
	charset=us-ascii
List-Id: Porting FreeBSD to ARM processors <freebsd-arm.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-arm
List-Help: <mailto:freebsd-arm+help@freebsd.org>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Subscribe: <mailto:freebsd-arm+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-arm+unsubscribe@freebsd.org>
Sender: owner-freebsd-arm@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\))
Subject: Re: pid xxxx (progname), jid 0, uid 0, was killed: failed to reclaim
 memory
From: Mark Millard <marklmi@yahoo.com>
In-Reply-To: <2d862f69-02f0-7356-3a75-d410b4a5a961@c0decafe.de>
Date: Tue, 2 May 2023 03:20:03 -0700
Cc: "freebsd-arm@freebsd.org" <freebsd-arm@FreeBSD.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <58DC4BAF-7FB2-4216-8F38-6D26305ABD78@yahoo.com>
References: <c11e6616-fa9d-6ca0-670b-59ffc833a117@c0decafe.de>
 <2d862f69-02f0-7356-3a75-d410b4a5a961@c0decafe.de>
To: Daniel <freebsd-arm@c0decafe.de>
X-Mailer: Apple Mail (2.3731.400.51.1.1)
X-Rspamd-Queue-Id: 4Q9bgT3Mhbz3PkX
X-Spamd-Bar: ----
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-ThisMailContainsUnwantedMimeParts: N

On May 2, 2023, at 01:35, Daniel <freebsd-arm@c0decafe.de> wrote:

> On 5/2/23 09:25, Mark Millard wrote:
>=20
>> On May 1, 2023, at 23:55, Daniel <freebsd-arm@c0decafe.de> wrote:
>>=20
>>> I noticed that on my aarch64 boards I recently get a lot of 'pid =
xxxx (progname), jid 0, uid 0, was killed: failed to reclaim memory' =
messages in syslog.
>>>=20
>>> This happens on a rockpro64 as well as a raspberry pi 3b, both =
running 13.2. None of the boards is near its memory capacity (more like =
less than 50% used).
>> Are you counting RAM+SWAP as "memory capacity"? Just RAM?
>=20
> with memory capacity i mean just RAM. Take this vmstat from the pi as =
an example:
>=20
>=20
> # vmstat
>  procs    memory    page                      disks faults cpu
>  r  b  w  avm  fre  flt  re  pi  po   fr   sr mm0 md0   in   sy cs us =
sy id
>  0  0  0 864M 511M 2.2K   0  15   0 2.4K   72   0   0 19730 1.8K 6.1K  =
4  3 93

When was that command done? :
A) Somewhat or just before the process was killed?
B) After the notice was displayed about the kill
   (or, at least, after the process was killed)?

If (B), it is too late to see the memory usage
conditions that lead to the kill: RAM already
freed by the kill. One needs to be monitoring the
memory usage pattern/sequence that leads up to the
kill.

If it is too early, the conditions that leads to the
kill need not have happend yet.

It may be more reliable to get an idea by monitoring
free memory over time via, say, top, spanning from
somewhat before the problem occurs to after the
problem. "systat -vmstat" is another display that
can be used for monitoring.


A question can be: ZFS in use vs. not?

> yes, there is a memdisk (as unionfs overlay for the way too frequently =
dying and now ro sdcard) but its barley used:
>=20
>=20
> # df -h
> Filesystem         Size    Used   Avail Capacity  Mounted on
> [...]
> /dev/md0           496M    4.6M    451M     1%    /rwroot

Off the top of my head, I do not know if it is the
Size vs. the Used above that that indicates the
(virtual) memory space use better.

> still i see processes being killed with the above message.
>=20
>=20
> on the rockpro64 i had an fairly huge swap (4G) on an nvme that never =
really got filled (~500megs maybe).

Swap usage is not directly relevant. The kills can happen
with no swap in use (despite swap space having been
configured) based on one or more processes that stay
runnable and that keep sufficiently large working sets
active. Large swap spaces (that avoid warning about
possible mistuning) are something like 3.6 or so times the
RAM. This can be too small for tmpfs use in poudriere
for the likes of poudriere's USE_TMPFS=3Dall : rust
can use over 10 GiBytes of file space in its build.

> I'll try your suggestions below, thanks!
>=20
> Do you know of any recent changes to memory mgmt, oom conditions that =
might trigger this?

No. But I also have no good understanding of the
complete workload on the systems that get the
notice.

> I've been running this setup (slapd, radiusd, smtpd) for quite some =
time on the pi now without any problems, before going to 13.2

I'm not familiar with slapd, radiusd, or the like.

> Thanks!
>=20
>=20
>> The message is strictly about maintaining a certain amount of free
>> RAM. It turns out swap does not automatically avoid the issue for
>> all contexts.
>>=20
>> QUOTE from back in 2022-Dec for another context with the problem:
>> (I've not reworked the wording to your context but the points
>> will probably be clear anyway.)
>>=20
>> This is the FreeBSD kernel complaining about the configuration
>> not well matching the RPi3B+ workload. In essence, it was unable
>> to achieve it targeted minimum amount of free RAM in the sort of
>> time frame (really: effort) it is configured for. Depending on
>> what you do, the FreeBSD defaults do not work well for 1 GiByte
>> of RAM. Swap space alone is insufficient because FreeBSD does
>> not swap out processes that stay runnable. Just one process that
>> stays runnable using a working set that is as large as the fits
>> RAM for overall operation will lead to such "failed to reclaim
>> memory" kills.
>>=20
>> But, if you are getting this, you will almost certainly need
>> a non-trivial swap space anyway.
>>=20
>> I have a starting point to recommend, configuring some
>> settings. As I've no detailed clue for your context,
>> I'll just provide the general description.
>>=20
>>=20
>> A) I recommend a swap space something like shown in
>> the below (from gpart show output):
>>=20
>> =3D> 40 1953525088 da0 GPT (932G)
>> 40 532480 1 efi (260M)
>> 532520 2008 - free - (1.0M)
>> 534528 7340032 2 freebsd-swap (3.5G)
>> . . .
>> 67643392 1740636160 5 freebsd-ufs (830G)
>> 1808279552 145245576 - free - (69G)
>>=20
>> This size (3.5 GiBytes or so) is somewhat below
>> were FreeBSD starts to complain about potential
>> mistuning from a large swap space, given the 1
>> GiByte of RAM. (I boot the same boot media on a
>> variety of machines and have other swap partitions
>> to match up with RAM sizes. But I omitted showing
>> them.)
>>=20
>> It is easy to have things like buildworld or
>> building ports end up with individual processes
>> that are temporarily bigger than the 1 GiByte RAM.
>> Getting multiple cores going can also lead to
>> not fitting and needing to page.
>>=20
>> I'll note that I normally use USB3 NVMe media that
>> also works with USB2 ports. My alternate is USB3
>> SSD media that works with USB2 ports. I avoid
>> spinning rust and microsd cards. This limits what
>> I can usefully comment on for some aspects of
>> configuration related to the alternatives.
>>=20
>>=20
>> B) /boot/loader.conf content:
>>=20
>> #
>> # Delay when persistent low free RAM leads to
>> # Out Of Memory killing of processes:
>> vm.pageout_oom_seq=3D120
>> #
>> # For plunty of swap/paging space (will not
>> # run out), avoid pageout delays leading to
>> # Out Of Memory killing of processes:
>> vm.pfault_oom_attempts=3D-1
>> #
>> # For possibly insufficient swap/paging space
>> # (might run out), increase the pageout delay
>> # that leads to Out Of Memory killing of
>> # processes (showing defaults at the time):
>> #vm.pfault_oom_attempts=3D 3
>> #vm.pfault_oom_wait=3D 10
>> # (The multiplication is the total but there
>> # are other potential tradoffs in the factors
>> # multiplied, even for nearly the same total.)
>>=20
>> If use of vm.pfault_oom_attempts=3D-1 is going to
>> be inappropriate, I do not have background with
>> figuring out a good combination of settings for
>> vm.pfault_oom_attempts and vm.pfault_oom_wait .
>>=20
>> I'll note that vm.pageout_oom_seq is not a time
>> --more like how many insufficient tries to
>> reclaim RAM happen in sequence before an OOM
>> kill is started (effort). 120 is 10 times the
>> default. While nothing disables such criteria,
>> larger figures can be used if needed. (I've
>> never had to but others have.)
>>=20
>>=20
>> C) /etc/sysctl.conf content:
>>=20
>> #
>> # Together this pair avoids swapping out the process kernel stacks.
>> # This avoids one way for processes for interacting with the system
>> # from ending up being hung-up.
>> vm.swap_enabled=3D0
>> vm.swap_idle_enabled=3D0
>>=20
>>=20
>> D) I strictly avoid having tmpfs complete for RAM
>> in this kind of context. tmpfs use just makes
>> avoiding "failed to reclaim memory" more difficult
>> to avoid. (As various folks have run into despite
>> having vastly more RAM than an RPi3B+.) So my
>> /usr/local/etc/poudriere.conf has:
>>=20
>> USE_TMPFS=3Dno
>>=20
>> There are examples, like building rust, where
>> anything but "no" or "data" leads to huge 10
>> GiByte+ tmpfs spaces for poudriere's build
>> activity. Not a good match to an RPi3B+ .
>>=20
>>=20
>> That is it for the recommendations of a starting
>> point configuration.
>>=20
>> With such measures, I've been able to have poudriere
>> with -j4 but also using ALLOW_MAKE_JOBS=3D without using
>> the likes of MAKE_JOBS_NUMBER limiting it. (So the
>> load average could be around 16 a fair amount of the
>> time but still not get "failed to reclaim memory"
>> kills.)
>>=20
>> Note: I'm not claiming above that -j4 is the best
>> setting to use from, say, a elapsed time point of
>> view for my poudriere bulk activity.
>>=20
>> END QUOTE
>>=20
>>> It started on the rockpro64 first, were i did a bit of fiddling, =
e.g. en/disable swap, replace zfs with ufs, etc. nothing helped in the =
end. I thought the board might be defective but now i start seeing the =
same thing on the raspi as well.
>>>=20
>>> Any ideas what this could be or how to debug this further?
>>>=20
>>=20

=3D=3D=3D
Mark Millard
marklmi at yahoo.com