From nobody Tue May 10 18:49:46 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 30FDB1AC4D0C for ; Tue, 10 May 2022 18:50:01 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic307-55.consmr.mail.gq1.yahoo.com (sonic307-55.consmr.mail.gq1.yahoo.com [98.137.64.31]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4KyRtJ07FJz3MPg for ; Tue, 10 May 2022 18:49:59 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1652208593; bh=dosDbFcjQkk6Bh5dJAVReCnBgWIz7XhJTxo+i8bsN4Q=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=fcGDPs0Te3o90mR2NeTa8heJFNsTurCA2S+UUsOYgLV5d17PWIbOztjWWAraPB4jQD4v+HMUnOvHND8ugt7HIeURMzHGwYUr2Sfyw1LSGoayRcUflkhguftLD3fqI66OrtDAsr6LIBJ55hfl4N/SPVBncOwj+50HF5HfobsfiZCTwTkyXkwKBRsZil0QsdXmjR1cZGxkkGfQDveGH4LytJNmOXgpPNjTMgOnOYasSDPK7h92y/Y4xkXLkg/xPYgbEy9mxtDq8yPlf/SnJuVsrZwc8sLpdFPzXbwxm7Vy9FYWIW5uoHNwcH2OwhsoVodQ2FT0zuprKRNDGDjizqtfXA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1652208593; bh=BJ2vAa+MmbRoibzLrM4hr6hcuWunJfMQ5bAOvs6s9ah=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=d1INaMN7xbQntWpzr5REIPbhoA1t2Bpw/cMEsoeTSVM8+VIdx5Dw0u6QqsZX2snJaQVlR3jZ/vCNq/EHKhVeN2Tud6aQgHCHbujfhvbBIO2u2lZnncnb74S9uF/3J3XZs1WpbgWVDgB0GtqpiHfFJ5ogbiQ7wqy4D+h298oJbmkouXgvJ0mbK9Atj3UUZRrZDOiDlMtiYNrS2A5q/vB3BUYInrBubZUtNgbXA+ieawXXEgHGX/umUB5wyq96YI3w8VSin6PBWCZws82BaNE1nJEBrtOGnG/MzgxaOBjfvLqGs0JVovr9ErTcMCUuY6Zn2w0Lo35lW4PbnURL0vcYXg== X-YMail-OSG: D1adEBIVM1l0ouDcaWA_o.ysty8ij2GnQFUGtGtrgeHwWkiLxuIXsFFrz9Ps4wi OjPn_oayJ9eAeZikOLkf5W6xc6tOlYnY_idgdQSkWRrN4aF2mwTzR0QOK8OsOVlAg.pSgG8SoTZL LpAD9jEPEImcVidiMpYF09WnV9GHT7e.JhvxNDop6F1dhYtgA87sNmpNE.XD2Shq6WXKCiYqU4Gn MFN5FnJHt18S6tXHZ84.bni1Qhhp.oD9Nufw79WQKdUcrAJzQ1Jz_rvoeRlTS3T2JZLZWUZSbvca xSCeJaTOntQwDsLKImXKnIlo1MpySApATHespomEcLKiJOVCVHCTP11unKlMUImizquKCZn90jVo VuLjili4_PVhqR3M88FA7O6nYhOCFpnIQi7ougdAyOAcVWDwNBZ2vwrHK5MIfGJB0PxY2A9iUpp4 BBvMuj40iVKRghbkDZGUo5pep5xG7BrLjcgU5OAF4q9zSh0WIMpH5BOC5QmyYot9Y.ukHHCTgwmc Yo8OpYpKmL5nkfOeJN4yT_Sc74BKuTYD6Ln3SuIe0at54N8VA1Sq3oQVwkViWef4wFTGF.6ZSagP u8npvIAAuXgE01pAtWAlWybhdSS.5zGh_ACBfv5xQZH0A3Allzi2RvhMysizC40fkibUUkIl76KE vy2TU.tMnEBi_cA_V7twaWru3I85zfND7cIgLngAuwpDXMAmQh.0ZXI29z3PDes3sPhb0mHDEVLP QUgY42LDvbDx2XNZHjpMBeJy9kXpdzP550QpySt0GK9p5PAx78gYd16Z.EACkg93X5XU_5U_qGJu inqBsguGtEJ.aVoVuEnZ4.duOiL38lsxl9GONRLqOaf9ql9YLQncMHRFi36m9jvXMIfBqYHGOFAy QkTmebfD7CdAwu1kaj7FOcPN4VDcrcWfkMb0fTMYTkt25EO.L_BTWsYi30xuwim3EWU0rujGHtPG zjrASSSk_Bh4V8o1lFH_KbYMEb5YRCwZP_GhXZIWmuVojSbRUh9iG5wY1w6DgZSIN65oIjzI9JQ8 DGq_nHSR1ryDzaxlI3RDlz6d62P7Pw7eU8Z8ClWwgIsE8HYvcG5eeaDhwnx0A1bH.t3KinaexIez o7sp9HjIS8GnpwI.c6dEQ8FYBk8v4lYDpMLl8qArRXY1ndGZSkjeUq6KFKpxkRYTlMb2QGSPWwaD fUS3so8d2esq9wWGaWkjOA..mgSTeN0ygmlFcurAZa240iNISEixSSxfEiDmTNm4HReM_p2KTd.J KeOmIX57Lfg4n0at2RTR8AltfRH.zy7MJ7mvKH91uO2NIoAb46yeGp.IskFnlNwh6CSJy_6UPO4_ g2bI1RlaPLBOfCiuM.SCUXB96Boop9NmUt.xuavRtsqbdML_v5oneKa7fgRvwLp34kcsZwkrPZQG npYloYKzOx5PzIxTLN03Y1hhrdcrrn3i4TzF4nTRAWXWctpBUYCcM_s.1rVxcz5SxwEh7xIG6lwI 0dCh_Q5deNolO9kGEhkqcy75u9CMErtw8gLWLQ25Y0JDgQwZsZfRhMJi2mmn_hfG.vVzuH1iPi9O ETujCe18Uki5Y4SEL9sz6N7FN.NDbm6BiqzBDpPZ15hTieMv48siWprLGiAzKDC.ZKu2V7WSJSlJ X.wJoYRhs27PP95lKZqJTlzGZNUTjX6W7BuDExY4ODKllfcLXJPO7Ea.yXAG9gMj8uDOoZ2ULFoP ni3yOz6Rumq46bANvoI3p5wBGvG5e355MJlsYUTWISectXXaovf2BjjTIyHN0jYwqDJ7bBo3RGrX tkvDmUkNVJoJJvVwkoQP6tEHGM6sw6VrXKwEC8Hd9VuL67OSwBryIV7Jh5_HHbH3vOiolJj8f9LZ z3C8UlAsGzjoq1h9scu87AUCqSqYp_oEu2XuO0Lgbss_LA_RSnRj4mbumId31Q1fLJfoGpavrEen DKnvfS0fBstsj9s6a4KlrkD29CITIg.t6Th.wbu1FeblfrMAfiwrEhwUEXh774.l5TH4GfjuWyNS Z7.ply_XHjyOjEfM1IZY5osUMIYD8QxPpq9hjY19cC.dng0U5pGPMQ6uQej1RGAnbEvoB4F4AzH_ UsxjdnOaPWI3Rn6.VDvNwwIK_3XJiQSqmveV1O.8inkCGjta7vfqMgWNcsiu0wOghje3uK2kcVMP XurMdCEJzizXfSGE89aR7sqElgkj.qL_bTHIk0sTja.gQg5UhHxRnQag4gMeI7fBMyQ8GbCaGKBf QabX0JzgF1eYyra9EQSjts9wgD0xsswFPoBC1C8TsOZY.wJgVMSCWo0evH68CzdcdIdyqg8xLNmr M854SgPDf X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.gq1.yahoo.com with HTTP; Tue, 10 May 2022 18:49:53 +0000 Received: by hermes--canary-production-gq1-55bffbc6f9-t9x9t (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID bace21c8032641ca74130dc8e333f0f5; Tue, 10 May 2022 18:49:49 +0000 (UTC) Content-Type: text/plain; charset=utf-8 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: Chasing OOM Issues - good sysctl metrics to use? From: Mark Millard In-Reply-To: <75C02C8C-6A5E-4E19-AC7D-B5DB704E8F16@transactionware.com> Date: Tue, 10 May 2022 11:49:46 -0700 Cc: Pete Wright , freebsd-current Content-Transfer-Encoding: quoted-printable Message-Id: References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org> <33B740AA-A431-49CB-9F27-50B8C49734A2@yahoo.com> <3C5C183F-1471-4139-A53C-0B3815CFC25E@yahoo.com> <75C02C8C-6A5E-4E19-AC7D-B5DB704E8F16@transactionware.com> To: Jan Mikkelsen X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4KyRtJ07FJz3MPg X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=fcGDPs0T; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.64.31 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-1.86 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-0.36)[-0.358]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; SUBJECT_ENDS_QUESTION(1.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.64.31:from]; MLMMJ_DEST(0.00)[freebsd-current]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On 2022-May-10, at 08:47, Jan Mikkelsen = wrote: > On 10 May 2022, at 10:01, Mark Millard wrote: >>=20 >> On 2022-Apr-29, at 13:57, Mark Millard wrote: >>=20 >>> On 2022-Apr-29, at 13:41, Pete Wright wrote: >>>>=20 >>>>> . . . >>>>=20 >>>> d'oh - went out for lunch and workstation locked up. i *knew* i = shouldn't have said anything lol. >>>=20 >>> Any interesting console messages ( or dmesg -a or /var/log/messages = )? >>>=20 >>=20 >> I've been doing some testing of a patch by tijl at FreeBSD.org >> and have reproduced both hang-ups (ZFS/ARC context) and kills >> (UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim >> memory", both with and without the patch. This is with only a >> tiny fraction of the swap partition(s) enabled being put to >> use. So far, the testing was deliberately with >> vm.pageout_oom_seq=3D12 (the default value). My testing has been >> with main [so: 14]. >>=20 >> But I also learned how to avoid the hang-ups that I got --but >> it costs making kills more likely/quicker, other things being >> equal. >>=20 >> I discovered that the hang-ups that I got were from all the >> processes that I interact with the system via ending up with >> the process's kernel threads swapped out and were not being >> swapped in. (including sshd, so no new ssh connections). In >> some contexts I only had escaping into the kernel debugger >> available, not even ^T would work. Other times ^T did work. >>=20 >> So, when I'm willing to risk kills in order to maintain >> the ability to interact normally, I now use in >> /etc/sysctl.conf : >>=20 >> vm.swap_enabled=3D0 >=20 > I have been looking at an OOM related issue. Ignoring the actual leak, = the problem leads to a process being killed because the system was out = of memory. This is fine. After that, however, the system console was = black with a single block cursor and the console keyboard was = unresponsive. Caps lock and num lock didn=E2=80=99t toggle their lights = when pressed. >=20 > Using an ssh session, the system looked fine. USB events for the = keyboard being disconnected and reconnected appeared but the keyboard = stayed unresponsive. >=20 > Setting vm.swap_enabled=3D0, as you did above, resolved this problem. = After the process was killed a perfectly normal console returned. >=20 > The interesting thing is that this test system is configured with no = swap space. >=20 > This is on 13.1-RC5. >=20 >> This disables swapping out of process kernel stacks. It >> is just with that option removedfor gaining free RAM, there >> fewer options tried before a kill is initiated. It is not a >> loader-time tunable but is writable, thus the >> /etc/sysctl.conf placement. >=20 > Is that really what it does? =46rom a quick look at the code in = vm/vm_swapout.c, it seems little more complex. I was going by its description: # sysctl -d vm.swap_enabled vm.swap_enabled: Enable entire process swapout Based on the below, it appears that the description presumes vm.swap_idle_enabled=3D=3D0 (the default). In my context vm.swap_idle_enabled=3D=3D0 . Looks like I should also list: vm.swap_idle_enabled=3D0 in my /etc/sysctl.conf with a reminder comment that the pair of =3D0's are required for avoiding the observed hang-ups. The analysis goes like . . . I see in the code that vm.swap_enabled !=3D0 causes VM_SWAP_NORMAL : void vm_swapout_run(void) { =20 if (vm_swap_enabled) vm_req_vmdaemon(VM_SWAP_NORMAL); } and that in turn leads to vm_daemon to: if (swapout_flags !=3D 0) { /* * Drain the per-CPU page queue batches as a = deadlock * avoidance measure. */ if ((swapout_flags & VM_SWAP_NORMAL) !=3D 0) vm_page_pqbatch_drain(); swapout_procs(swapout_flags); } Note: vm.swap_idle_enabled=3D=3D0 && vm.swap_enabled=3D=3D0 ends up with swapout_flags=3D=3D0. vm.swap_idle. . . defaults seem to be (in my context): # sysctl -a | grep swap_idle vm.swap_idle_threshold2: 10 vm.swap_idle_threshold1: 2 vm.swap_idle_enabled: 0 For reference: /* * Idle process swapout -- run once per second when pagedaemons are * reclaiming pages. */ void vm_swapout_run_idle(void) { static long lsec; =20 if (!vm_swap_idle_enabled || time_second =3D=3D lsec) return; vm_req_vmdaemon(VM_SWAP_IDLE); lsec =3D time_second; } [So vm.swap_idle_enabled=3D=3D0 avoids VM_SWAP_IDLE status.] static void vm_req_vmdaemon(int req) { static int lastrun =3D 0; =20 mtx_lock(&vm_daemon_mtx); vm_pageout_req_swapout |=3D req; if ((ticks > (lastrun + hz)) || (ticks < lastrun)) { wakeup(&vm_daemon_needed); lastrun =3D ticks; } mtx_unlock(&vm_daemon_mtx); } [So VM_SWAP_IDLE and VM_SWAP_NORMAL are independent bits in vm_pageout_req_swapout.] vm_deamon does: mtx_lock(&vm_daemon_mtx); msleep(&vm_daemon_needed, &vm_daemon_mtx, PPAUSE, = "psleep", vm_daemon_timeout); swapout_flags =3D vm_pageout_req_swapout; vm_pageout_req_swapout =3D 0; mtx_unlock(&vm_daemon_mtx); So vm_pageout_req_swapout is regenerated after thata each time. I'll not show the code for vm.swap_idle_enabled!=3D0 . =3D=3D=3D Mark Millard marklmi at yahoo.com