From nobody Tue May 10 15:47:06 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 3CB581AE70E3 for ; Tue, 10 May 2022 15:47:25 +0000 (UTC) (envelope-from janm@transactionware.com) Received: from mail3.transactionware.com (mail.transactionware.com [203.14.245.7]) by mx1.freebsd.org (Postfix) with SMTP id 4KyMqb6L4Lz3vWB for ; Tue, 10 May 2022 15:47:23 +0000 (UTC) (envelope-from janm@transactionware.com) Received: (qmail 87732 invoked by uid 907); 10 May 2022 15:47:14 -0000 Received: from i5E8640AA.versanet.de (HELO smtpclient.apple) (94.134.64.170) (smtp-auth username janm, mechanism plain) by mail3.transactionware.com (qpsmtpd/0.84) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) ESMTPSA; Wed, 11 May 2022 01:47:14 +1000 Content-Type: text/plain; charset=utf-8 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: Chasing OOM Issues - good sysctl metrics to use? From: Jan Mikkelsen In-Reply-To: <3C5C183F-1471-4139-A53C-0B3815CFC25E@yahoo.com> Date: Tue, 10 May 2022 17:47:06 +0200 Cc: Pete Wright , freebsd-current Content-Transfer-Encoding: quoted-printable Message-Id: <75C02C8C-6A5E-4E19-AC7D-B5DB704E8F16@transactionware.com> References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org> <33B740AA-A431-49CB-9F27-50B8C49734A2@yahoo.com> <3C5C183F-1471-4139-A53C-0B3815CFC25E@yahoo.com> To: Mark Millard X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4KyMqb6L4Lz3vWB X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of janm@transactionware.com has no SPF policy when checking 203.14.245.7) smtp.mailfrom=janm@transactionware.com X-Spamd-Result: default: False [1.54 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.96)[-0.963]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[transactionware.com]; AUTH_NA(1.00)[]; R_SPF_NA(0.00)[no SPF record]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_SPAM_LONG(1.00)[1.000]; MLMMJ_DEST(0.00)[freebsd-current]; FREEMAIL_TO(0.00)[yahoo.com]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:17559, ipnet:203.14.245.0/24, country:AU]; SUBJECT_ENDS_QUESTION(1.00)[]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On 10 May 2022, at 10:01, Mark Millard wrote: >=20 > On 2022-Apr-29, at 13:57, Mark Millard wrote: >=20 >> On 2022-Apr-29, at 13:41, Pete Wright wrote: >>>=20 >>>> . . . >>>=20 >>> d'oh - went out for lunch and workstation locked up. i *knew* i = shouldn't have said anything lol. >>=20 >> Any interesting console messages ( or dmesg -a or /var/log/messages = )? >>=20 >=20 > I've been doing some testing of a patch by tijl at FreeBSD.org > and have reproduced both hang-ups (ZFS/ARC context) and kills > (UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim > memory", both with and without the patch. This is with only a > tiny fraction of the swap partition(s) enabled being put to > use. So far, the testing was deliberately with > vm.pageout_oom_seq=3D12 (the default value). My testing has been > with main [so: 14]. >=20 > But I also learned how to avoid the hang-ups that I got --but > it costs making kills more likely/quicker, other things being > equal. >=20 > I discovered that the hang-ups that I got were from all the > processes that I interact with the system via ending up with > the process's kernel threads swapped out and were not being > swapped in. (including sshd, so no new ssh connections). In > some contexts I only had escaping into the kernel debugger > available, not even ^T would work. Other times ^T did work. >=20 > So, when I'm willing to risk kills in order to maintain > the ability to interact normally, I now use in > /etc/sysctl.conf : >=20 > vm.swap_enabled=3D0 I have been looking at an OOM related issue. Ignoring the actual leak, = the problem leads to a process being killed because the system was out = of memory. This is fine. After that, however, the system console was = black with a single block cursor and the console keyboard was = unresponsive. Caps lock and num lock didn=E2=80=99t toggle their lights = when pressed. Using an ssh session, the system looked fine. USB events for the = keyboard being disconnected and reconnected appeared but the keyboard = stayed unresponsive. Setting vm.swap_enabled=3D0, as you did above, resolved this problem. = After the process was killed a perfectly normal console returned. The interesting thing is that this test system is configured with no = swap space. This is on 13.1-RC5. > This disables swapping out of process kernel stacks. It > is just with that option removedfor gaining free RAM, there > fewer options tried before a kill is initiated. It is not a > loader-time tunable but is writable, thus the > /etc/sysctl.conf placement. Is that really what it does? =46rom a quick look at the code in = vm/vm_swapout.c, it seems little more complex. Regards, Jan M.