From nobody Sun Jun 02 23:57:04 2024 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Vsv0s5rtnz5LWW4 for ; Sun, 02 Jun 2024 23:57:17 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Vsv0s0ySfz4HX0 for ; Sun, 2 Jun 2024 23:57:17 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=nlBmPrMq; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=freebsd.org (policy=none); spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::729 as permitted sender) smtp.mailfrom=markjdb@gmail.com Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-79505987854so64115285a.0 for ; Sun, 02 Jun 2024 16:57:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717372634; x=1717977434; darn=freebsd.org; h=content-disposition:mime-version:message-id:subject:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=7HRt9ZAjjnYduWH2+K91LodOOtIWyUNQ00HVyZASzDY=; b=nlBmPrMqNLjt9rBhKxgduyuMeDBxZJA+yNXbNIDMwLNRAPEkNQXgne9ZJE9JqzQ4xr eQEHvPE4ftF2EQGBB1p0ri7zYS5Nox9ZLbcrI6QZNoBWIwCmltR7L873Us1Ih9FgbJKb 4A2kQ5M8NlK6vUlVBwMNjw7EQgqmcAkqZaEQeyk/ZQNIskemT5snM8D53A/17AdDh0ki cgMLOkoL1I1fHThDU2hOmLt3MMC5h0tfK2ey6NhIWtcnyhBaxTh+MoCYzDIsz+9oaG6C FVPZrHpj2ExUivDhfzm31nahsgOqX/kd363LyW//pJI9tLqW/hZjXiCrNBWOpe3fQees JAwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717372634; x=1717977434; h=content-disposition:mime-version:message-id:subject:to:from:date :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7HRt9ZAjjnYduWH2+K91LodOOtIWyUNQ00HVyZASzDY=; b=Lkgo5wjJsSEbyOdxW87qQ99YKL8SvqX2ab3KpxzxFDiUq+roLbVXRwL5Xb3yaawWnx p2h0OyOWSTo+irKyz6Zgbyjugxy/se/ZibKEjamneDOEvrUoE4aVyU+dWZ/4jnUDSu70 7FPMHVbd8wlbJragu+MXaaafFJNwHOdmugwHZ7477xycC44IT/RrDuIjvynKUwIkm3N8 Nl7IAve/ZBlIElq0lGpbnk3ffhNlCklEVLqT7W7K1ipI41vcqS3fGEDhiGsMGfPVjCCA VMRI9agfLAXto5lGsj6x3+IbCZBt+Q5nDvmAcrONgFQBdzTLN5MEneFAXS2uZdgqglBW sorQ== X-Gm-Message-State: AOJu0YzAeAold582DmwccfPL3r+M5d4CCHSgwJlRDrEY6Md0ostcfZSA EN6s+jnMtYcb7ElDwItpMFIjWux6UxZLKQ7MSR4r6I2Sd3lFIr0NbdOBPw== X-Google-Smtp-Source: AGHT+IGQCUMpOHvYmh1M5IGgE1x9xbwgDht89kDPLFlnqMEwNNdw/Hw+dyfLPfoVXjNeDNnDRW+0WA== X-Received: by 2002:a05:620a:3b8b:b0:794:f2ee:d126 with SMTP id af79cd13be357-794f5c665f9mr837125785a.11.1717372634467; Sun, 02 Jun 2024 16:57:14 -0700 (PDT) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id af79cd13be357-794f3017d2asm232140585a.55.2024.06.02.16.57.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Jun 2024 16:57:13 -0700 (PDT) Date: Sun, 2 Jun 2024 19:57:04 -0400 From: Mark Johnston To: freebsd-arch@freebsd.org Subject: removing support for kernel stack swapping Message-ID: List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.60 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.996]; MID_RHS_NOT_FQDN(0.50)[]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[freebsd.org : SPF not aligned (relaxed), DKIM not aligned (relaxed),none]; RCVD_TLS_LAST(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MISSING_XM_UA(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; DKIM_TRACE(0.00)[gmail.com:+]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arch@freebsd.org]; TO_DOM_EQ_FROM_DOM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-arch@freebsd.org]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::729:from] X-Rspamd-Queue-Id: 4Vsv0s0ySfz4HX0 FreeBSD will, when free pages are scarce, try to swap out the kernel stacks (typically 16KB per thread) of sleeping user threads. I'm told that this mechanism was first implemented in BSD for the VAX port and that stabilizing it was quite an endeavour. This feature has wide-ranging implications for code in the kernel. For instance, if a thread allocates a structure on its stack, links it into some data structure visible to other threads, and goes to sleep, it must use PHOLD to ensure that the stack doesn't get swapped out while sleeping. A missing PHOLD can thus result in a kernel panic, but this kind of mistake is very easy to make and hard to catch without thorough stress testing. The kernel stack allocator also requires a fair bit of code to implement this feature, and we've had multiple bugs in that area, especially in relation to NUMA support. Moreover, this feature will leave threads swapped out after the system has recovered, resulting in high scheduling latency once they're ready to run again. In a very stressed system, it's possible that we can free up something like 1MB of RAM using this mechanism. I argue that this mechanism is not worth it on modern systems: it isn't going to make the difference between a graceful recovery from memory pressure and a catatonic state which forces a reboot. The complexity and resulting bugs it induces is not worth it. At the BSDCan devsummit I proposed removing support for kernel stack swapping and got only positive feedback. Does anyone here have any comments or objections?