From nobody Wed May 11 03:31:35 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id B5AF21ADE39B for ; Wed, 11 May 2022 03:31:49 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic303-23.consmr.mail.gq1.yahoo.com (sonic303-23.consmr.mail.gq1.yahoo.com [98.137.64.204]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4KygSN4q9nz3QbT for ; Wed, 11 May 2022 03:31:48 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1652239900; bh=ldnkwlQa4Cs655eecb1pcSOiwAItLSoXZkNbfl1PeLk=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=Boyy9Slr1u5tWQ3Z24+DP2JYxHoQyguoiLPudK6QD0UbLP2Zyxdsww0krifT8hvKzc67xZv6DGF4VUGi3E3WG29VxgsaFl1W9jJ5zaYTeAzPc4Jj7L2ULzMQxafqidVZDCdlKRiuDMLhJhUGEzK9odrI07Cw8aq/NDVCOfjKJcauQO0tW3Rvd1kDCfm66Ap0k1onv+XMtDRVNxhYA+tAVgvOpoMWU0NDIkDKiNokTbxgg6I9LL9TXqdA2+Zm2ei33Yvj/61fro91AxupiaiOunJv2BJ6shodkWH3Sfk7ma1kZY//4qi+mBILOwdLX97sHWpiEuTHvAza1hiBW1BHzQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1652239900; bh=w3pySNr3ojw2LbI5PDfHCa4ljB6halRrwu3odvLQSUn=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=G77zeMhH/vp7RDf/lzOHKrOasxMZu2lX0lBFOYEFjWByKRV9jbE/ZJ5VARhMZFdveMPzPf+iroWkTzSho7N8b7odlMFyWvBcpAI2ZnD8CIDrzDNbJB65UEjZzSuKHEUkDttls4JjX6PC9Hh/crY9Vz78Ju8vlm6/sMQ04M4DqNRvIepaOgjfwim+IN5pX42DNgepEXNA4g6GJHDeJTsfAWSHR+j95oTL5XwyHwlJ2Tl4rdtoU+dh5Xuhdp45oN00VMJqbztmAFz+WpWuiVlqh+8eEvH2NU9ayghAmQt0YfGITRl0SaN3dLbCswmQrgjSCBIMY2WP7imMxNNWLnSXcw== X-YMail-OSG: HLamKZ0VM1kbcuy97PaWVeIsq7JQdhnZmluatPNWsj.MFoA8_hK4QN8Xi2FG_Xm 0QsCTdYZqbXihPPN9YuwnhZ9ML_3dmrOCCKYwtI_zKWYzUKrdzwGFBC8j_EaDZoi04IRmiUitkeG F4JUtemxCUJizVAPWMemT8ghtKNQfUZSu9_B4Npq0bjMwRUSTYcrhtDfHGyX7Fljlve1n_0X1KNk iNzuZVHEUB56.VDMojymuJnKa.ZgO62861QisL4bbd5I0clF03EzGcSaWkATUszZfNW0Wwku1wA. 16R09yEJ5mcNPf4qb2IPcXNpEtU27bW6Mjy7iZy9gGlQic_GAbEneQeV105ezpCh.0d8hGGR8XsT LEPG.7F7j0XZu0RLxYyk4XjR9OHTO24_N_XbrVHhJjp5_3TUP35sABtvXPSEB.zKtSjReF8e5jcQ ax0OVNpW7JxqifcHk.RtcglHjRdzn.j0EskiNaRYiVTu.1T963lfzZFbeltxmUsgeW6xIVWtEemm itBnrVvTJtVJrndTRKYokh1KNKq6Atx.VKdGCNC4iuaECwNgk_v3XUkdBK72T1HqPeTraxM4z41q SsRDnAcq3dEFrHPWLH3pxJDRZGa._U9rGSkhZfOE7H3Fo5kye9yUFPNiPbedP.SeKa9HlrezDchx BAwhJJViheOo6ekVJv_hQWfQROK_fo902sJwXIQuoChk4LWXirooGbBVv91PLSYYrQw6zyxJx8zv 7j0yzCA.qBnnbX49Tkr_oaMld5JDX2yWG7BkxwJZYBD50mr4X4WItfuThXDi4YvGJfk7ASwBJ6Nw MyaKg3sIsLVN9bBrnjbGVTXgi9A85o1vRPuaWWVNHeEGj.pFQMGRiM_tHXBNnK_R7OXkleRo71gI Op07JLAeOv0uVqYSI4mWcLZD1U6QDnfjEh2AbwU946CBXEfEk0Wlko6nJdeWs08NCdPjSsI9hB_B jvPGAFFl.5jZ49an85lfY9IDu.bTcMMfr8Hr4RJV6jK7SAjip.f_E_oBjOkm7jr7tSRYJJvF34Iy ZokvSEVnKVMuNVpf.CZ3EBZaSPug.vETZSOEPe7cw6rwEPHY9JQ7f0J9LWJh0Ggq6Psru5p1mb.e X.1Wk53YbI.EoitpDGY5PiR8mTTV5HkYx8hXIjenh_LI6VEOZuc1r3cerwAu77xnjMi99_EZX6zd CO.ot1omMhsAEboCDWsOsg8QT8tyLTmpslJaFXBlvZSIzPgG4ihGvyf2kcwsTM9bsLDa6LGwKemX iNvsbVpNv7tvB65UMryLuksm99.gRjctT1xzaoohirQV5MS7pjwTten01g38rBSDsnN4kqTp3td1 Z7BMNz3kuBqbiiJZX3U91terDWef.TlWuTfL4JJfWNuazPnG5xyDIz9CTC.ZyaWF7HtOLB65wyo_ qXCwqAXxRItD15k2ggmVM.VWRbgLa75.r0dPruVzyiLDPd61T0RsWtMyZX1TJDopuZGK_mmzPotr .8NhOfizE6hicaEN53V_Aew6fVZOTsxvcMTo_e9zl9DWLHefUkqwm6m0z27v7Ml_VHv6CfuxplvP yw8dg._woApdBZt.ewfGclAeEcUM0lJWOx.azZZFjAG3RhGJYgzisI6htdfiz.OCf8U17KwQZsdB SBXpWNPTi4kqJjedRzkvTiJEbzCK2flh9AAnYhvuMERyGvUgoqR4_s0ExfKT.vhEM4JCWXm_yaLQ BBZ1qI9R50eB54C.z_iJn4atNQRRdoIbSfXOUS9aD6mszzEAzMZy2tpRFmHMTfghjY.HdHk1o3cx TtSoNnSnwCeT9cw6I6tp.w9Dx6l8daYyyDrz9rQWiSpJR6ialRK0LtXsG4WlY3l1c3IQTH3TJUsN ilwp4V8Ky.KWbWxKkelzNFRdhhG69LDiZT9kw0DfdgcnQ2fqVaytx_hMX73T6NyFjjTaIKsT2qPY tw3QEWoKHNywoVvIPpcFjQR2bqgIQ03W9806fqnBp7SSw8J2jkA.8L3sLarqBQqVLd5bWt6VbimU ZnPdMnkVwtjPZfJhbe0whSzt9bbL6ITiLjQ.z7VjSzxbBdTXlom1txB0yCm_F5LwQUx0sHU.ycgK Bi0MJuc5xC.LFxpXIQ41qIRJgeK5V_EiIM3DQViWLSSR3fMvDtT6aSaunog4jfcBxjfGScqLy3Dd iSKtPuyolu4gCwHVWkkls3DppQ16XiAC7OzqPXwYpkuTC2sc5FZpz9gybFCrYtxkirXbnoEOsj7t 5bvn5AOfFFFl9WZz3800ZYZ8SFV2kYgXhJMzMznuCTOZbeJk3UL.7vBbKnM8Bn2TI21IoUX8IkaZ AxAp5.a9SIB8qivLJvDua X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.gq1.yahoo.com with HTTP; Wed, 11 May 2022 03:31:40 +0000 Received: by hermes--canary-production-ne1-8676f67b88-d48nl (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 56e72f7e16dfc5a4bd912a9e53d2df4f; Wed, 11 May 2022 03:31:38 +0000 (UTC) Content-Type: text/plain; charset=utf-8 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: Chasing OOM Issues - good sysctl metrics to use? From: Mark Millard In-Reply-To: Date: Tue, 10 May 2022 20:31:35 -0700 Cc: freebsd-current Content-Transfer-Encoding: quoted-printable Message-Id: References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org> <33B740AA-A431-49CB-9F27-50B8C49734A2@yahoo.com> <3C5C183F-1471-4139-A53C-0B3815CFC25E@yahoo.com> <75C02C8C-6A5E-4E19-AC7D-B5DB704E8F16@transactionware.com> To: Jan Mikkelsen , Pete Wright X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4KygSN4q9nz3QbT X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=Boyy9Slr; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.204 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-2.44 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; SUBJECT_ENDS_QUESTION(1.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-0.94)[-0.945]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.204:from]; MLMMJ_DEST(0.00)[freebsd-current]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On 2022-May-10, at 17:49, Mark Millard wrote: > On 2022-May-10, at 11:49, Mark Millard wrote: >=20 >> On 2022-May-10, at 08:47, Jan Mikkelsen = wrote: >>=20 >>> On 10 May 2022, at 10:01, Mark Millard wrote: >>>>=20 >>>> On 2022-Apr-29, at 13:57, Mark Millard wrote: >>>>=20 >>>>> On 2022-Apr-29, at 13:41, Pete Wright wrote: >>>>>>=20 >>>>>>> . . . >>>>>>=20 >>>>>> d'oh - went out for lunch and workstation locked up. i *knew* i = shouldn't have said anything lol. >>>>>=20 >>>>> Any interesting console messages ( or dmesg -a or = /var/log/messages )? >>>>>=20 >>>>=20 >>>> I've been doing some testing of a patch by tijl at FreeBSD.org >>>> and have reproduced both hang-ups (ZFS/ARC context) and kills >>>> (UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim >>>> memory", both with and without the patch. This is with only a >>>> tiny fraction of the swap partition(s) enabled being put to >>>> use. So far, the testing was deliberately with >>>> vm.pageout_oom_seq=3D12 (the default value). My testing has been >>>> with main [so: 14]. >>>>=20 >>>> But I also learned how to avoid the hang-ups that I got --but >>>> it costs making kills more likely/quicker, other things being >>>> equal. >>>>=20 >>>> I discovered that the hang-ups that I got were from all the >>>> processes that I interact with the system via ending up with >>>> the process's kernel threads swapped out and were not being >>>> swapped in. (including sshd, so no new ssh connections). In >>>> some contexts I only had escaping into the kernel debugger >>>> available, not even ^T would work. Other times ^T did work. >>>>=20 >>>> So, when I'm willing to risk kills in order to maintain >>>> the ability to interact normally, I now use in >>>> /etc/sysctl.conf : >>>>=20 >>>> vm.swap_enabled=3D0 >>>=20 >>> I have been looking at an OOM related issue. Ignoring the actual = leak, the problem leads to a process being killed because the system was = out of memory. This is fine. After that, however, the system console was = black with a single block cursor and the console keyboard was = unresponsive. Caps lock and num lock didn=E2=80=99t toggle their lights = when pressed. >>>=20 >>> Using an ssh session, the system looked fine. USB events for the = keyboard being disconnected and reconnected appeared but the keyboard = stayed unresponsive. >>>=20 >>> Setting vm.swap_enabled=3D0, as you did above, resolved this = problem. After the process was killed a perfectly normal console = returned. >>>=20 >>> The interesting thing is that this test system is configured with no = swap space. >>>=20 >>> This is on 13.1-RC5. >>>=20 >>>> This disables swapping out of process kernel stacks. It >>>> is just with that option removedfor gaining free RAM, there >>>> fewer options tried before a kill is initiated. It is not a >>>> loader-time tunable but is writable, thus the >>>> /etc/sysctl.conf placement. >>>=20 >>> Is that really what it does? =46rom a quick look at the code in = vm/vm_swapout.c, it seems little more complex. >>=20 >> I was going by its description: >>=20 >> # sysctl -d vm.swap_enabled >> vm.swap_enabled: Enable entire process swapout >>=20 >> Based on the below, it appears that the description >> presumes vm.swap_idle_enabled=3D=3D0 (the default). In >> my context vm.swap_idle_enabled=3D=3D0 . Looks like I >> should also list: >>=20 >> vm.swap_idle_enabled=3D0 >>=20 >> in my /etc/sysctl.conf with a reminder comment that the >> pair of =3D0's are required for avoiding the observed >> hang-ups. >>=20 >>=20 >> The analysis goes like . . . >>=20 >> I see in the code that vm.swap_enabled !=3D0 causes >> VM_SWAP_NORMAL : >>=20 >> void >> vm_swapout_run(void) >> { >>=20 >> if (vm_swap_enabled) >> vm_req_vmdaemon(VM_SWAP_NORMAL); >> } >>=20 >> and that in turn leads to vm_daemon to: >>=20 >> if (swapout_flags !=3D 0) { >> /* >> * Drain the per-CPU page queue batches as a = deadlock >> * avoidance measure. >> */ >> if ((swapout_flags & VM_SWAP_NORMAL) !=3D 0) >> vm_page_pqbatch_drain(); >> swapout_procs(swapout_flags); >> } >>=20 >> Note: vm.swap_idle_enabled=3D=3D0 && vm.swap_enabled=3D=3D0 ends >> up with swapout_flags=3D=3D0. vm.swap_idle. . . defaults seem >> to be (in my context): >>=20 >> # sysctl -a | grep swap_idle >> vm.swap_idle_threshold2: 10 >> vm.swap_idle_threshold1: 2 >> vm.swap_idle_enabled: 0 >>=20 >> For reference: >>=20 >> /* >> * Idle process swapout -- run once per second when pagedaemons are >> * reclaiming pages. >> */ >> void >> vm_swapout_run_idle(void) >> { >> static long lsec; >>=20 >> if (!vm_swap_idle_enabled || time_second =3D=3D lsec) >> return; >> vm_req_vmdaemon(VM_SWAP_IDLE); >> lsec =3D time_second; >> } >>=20 >> [So vm.swap_idle_enabled=3D=3D0 avoids VM_SWAP_IDLE status.] >>=20 >> static void >> vm_req_vmdaemon(int req) >> { >> static int lastrun =3D 0; >>=20 >> mtx_lock(&vm_daemon_mtx); >> vm_pageout_req_swapout |=3D req; >> if ((ticks > (lastrun + hz)) || (ticks < lastrun)) { >> wakeup(&vm_daemon_needed); >> lastrun =3D ticks; >> } >> mtx_unlock(&vm_daemon_mtx); >> } >>=20 >> [So VM_SWAP_IDLE and VM_SWAP_NORMAL are independent bits >> in vm_pageout_req_swapout.] >>=20 >> vm_deamon does: >>=20 >> mtx_lock(&vm_daemon_mtx); >> msleep(&vm_daemon_needed, &vm_daemon_mtx, PPAUSE, = "psleep", >> vm_daemon_timeout); >> swapout_flags =3D vm_pageout_req_swapout; >> vm_pageout_req_swapout =3D 0; >> mtx_unlock(&vm_daemon_mtx); >>=20 >> So vm_pageout_req_swapout is regenerated after thata >> each time. >>=20 >> I'll not show the code for vm.swap_idle_enabled!=3D0 . >>=20 >=20 > Well, with continued experiments I got an example of > a hangup for which looking via the db> prompt did not > show any swapping out of process kernel stacks > ( vm.swap_enabled=3D0 was the context, so expected ). > The environment was ZFS (so with ARC). >=20 > But this was testing with vm.pageout_oom_seq=3D120 instead > of the default vm.pageout_oom_seq=3D12 . It may be that > let sit long enough things would have unhung (external > perspective). >=20 > It is part of what I'm experimenting with so we will see. >=20 Looks like I might have overreacted, in that for my current tests there can be brief periods of delayed response, but things respond in a little bit. Definately not like the hang-ups I was getting with vm.swap_enabled=3D1 . =3D=3D=3D Mark Millard marklmi at yahoo.com