From nobody Sat Apr 23 17:26:18 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6EB721A86259 for ; Sat, 23 Apr 2022 17:26:28 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.nomadlogic.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Klyql1S5Qz3l3g for ; Sat, 23 Apr 2022 17:26:27 +0000 (UTC) (envelope-from pete@nomadlogic.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1650734779; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Pzw8Jhb64NUfrVnVaO26Eo0FmdOA2t5nIS/8ndWB5Y4=; b=XMG8QvaaIV+iU4LpfKpEXeqO69DD/he8JbK/i7xtUxyb6nCmbv6xgrAskmcqFY59Fcx9Dk hl64DKeFP6yFcTTBo43EMvU3nq68AVMxfJ9ElZBL1amm4r1KcRoEH7/SVLYCzvdvF9UDZf 3QH+K5Qi6U+iq6FJIJoTzRGbSOw+tXc= Received: from [192.168.1.160] (cpe-24-24-168-214.socal.res.rr.com [24.24.168.214]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 4925665a (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Sat, 23 Apr 2022 17:26:18 +0000 (UTC) Message-ID: <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> Date: Sat, 23 Apr 2022 10:26:18 -0700 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: Chasing OOM Issues - good sysctl metrics to use? Content-Language: en-US To: Mark Millard Cc: freebsd-current References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> From: Pete Wright In-Reply-To: <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4Klyql1S5Qz3l3g X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=XMG8Qvaa; dmarc=pass (policy=quarantine) header.from=nomadlogic.org; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org X-Spamd-Result: default: False [-2.98 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-0.99)[-0.987]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; NEURAL_HAM_SHORT(-1.00)[-0.996]; MLMMJ_DEST(0.00)[freebsd-current]; FREEMAIL_TO(0.00)[yahoo.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On 4/22/22 18:46, Mark Millard wrote: > On 2022-Apr-22, at 16:42, Pete Wright wrote: > >> On 4/21/22 21:18, Mark Millard wrote: >>> Messages in the console out would be appropriate >>> to report. Messages might also be available via >>> the following at appropriate times: >> that is what is frustrating. i will get notification that the processes are killed: >> Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory > Those messages are not reporting being out of swap > as such. They are reporting sustained low free RAM > despite a number of less drastic attempts to gain > back free RAM (to above some threshold). > > FreeBSD does not swap out the kernel stacks for > processes that stay in a runnable state: it just > continues to page. Thus just one large process > that has a huge working set of active pages can > lead to OOM kills in a context were no other set > of processes would be enough to gain the free > RAM required. Such contexts are not really a > swap issue. Thank you for this clarification/explanation - that totally makes sense! > > Based on there being only 1 "killed:" reason, > I have a suggestion that should allow delaying > such kills for a long time. That in turn may > help with investigating without actually > suffering the kills during the activity: more > time with low free RAM to observe. Great idea thank-you!  and thanks for the example settings and descriptions as well. > But those are large but finite activities. If > you want to leave something running for days, > weeks, months, or whatever that produces the > sustained low free RAM conditions, the problem > will eventually happen. Ultimately one may have > to exit and restart such processes once and a > while, exiting enough of them to give a little > time with sufficient free RAM. perfect - since this is a workstation my run-time for these processes is probably a week as i update my system and pkgs over the weekend, then dog food current during the work week. >> yes i have a 2GB of swap that resides on a nvme device. > I assume a partition style. Otherwise there are other > issues involved --that likely should be avoided by > switching to partition style. so i kinda lied - initially i had just a 2G swap, but i added a second 20G swap a while ago to have enough space to capture some cores while testing drm-kmod work.  based on this comment i am going to only use the 20G file backed swap and see how that goes. this is my fstab entry currently for the file backed swap: md99 none swap sw,file=/root/swap1,late 0 0 > >>> ZFS (so with ARC)? UFS? Both? >> i am using ZFS and am setting my vfs.zfs.arc.max to 10G. i have also experienced this crash with that set to the default unlimited value as well. > I use ZFS on systems with at least 8 GiBytes of RAM, > but I've never tuned ZFS. So I'm not much help for > that side of things. since we started this thread I've gone ahead and removed the zfs.arc.max setting since its cruft at this point.  i initially added it to test a configuration i deployed to a sever hosting a bunch of VMs. > I'm hoping that vm.pageout_oom_seq=120 (or more) makes it > so you do not have to have identified everything up front > and can explore easier. > > > Note that vm.pageout_oom_seq is both a loader tunable > and a writeable runtime tunable: > > # sysctl -T vm.pageout_oom_seq > vm.pageout_oom_seq: 120 > amd64_ZFS amd64 1400053 1400053 # sysctl -W vm.pageout_oom_seq > vm.pageout_oom_seq: 120 > > So you can use it to extend the time when the > machine is already running. fantastic.  thanks again for taking your time and sharing your knowledge and experience with me Mark! these types of journeys are why i run current on my daily driver, it really helps me better understand the OS so that i can be a better admin on the "real" servers i run for work.  its also just fun to learn stuff too heh. -p -- Pete Wright pete@nomadlogic.org @nomadlogicLA