From nobody Mon Jun 03 00:05:06 2024 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VsvB75XS8z5LXh7 for ; Mon, 03 Jun 2024 00:05:19 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VsvB74SpVz4JPl for ; Mon, 3 Jun 2024 00:05:19 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-6c3b5c5e32cso1729193a12.1 for ; Sun, 02 Jun 2024 17:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1717373118; x=1717977918; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=d4Gxq+HR5epjUfJkciwqxs1bM6Or81DnFnK8HPozf/0=; b=TRK8crJPjpSY+Kde89I07WGihqVgiA6w/xfViI3v+yEnmTOYQLSS3CAglszGwCJUlE uroB7C/ZqZebcCHDxFi0ck1Iu52cl4sQEry3Bh6keYZWnLmhUDMcdZyDSGpT+/DJIE+y TZGcm+MEwx3biopGRO0r25++FtSdLZrVSh7X1ATJnVdVuAwOg0QpxmuUiH/SyYscaQ79 AT9Y04amnhjcd44hjxv9PCrn0WjJpdbaF5ZKNxtwDtduVV/pDa6M4qKSBGVvEjWqjdDU pRgCaHGRqnZsZqgbWzkbam7WMaQV+0F8uPNTq93ciENSpQadsOfoyFqPoJyfQ9vvESFz 7VXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717373118; x=1717977918; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=d4Gxq+HR5epjUfJkciwqxs1bM6Or81DnFnK8HPozf/0=; b=X2ZIsqeojZZeJz+4TA2pLRDiuZqWFmvAGZuL0CqzLSCdf/4z+HdX/L7XH8ECPmtBQ6 oE+w7hCG81xjBCHhCQ8E2yMGEdaGqyFLTUgwIwY0QVejiXW8YVH+PTeRbh+ngL+xIw+V qZwxU5LTK21nNHlVFItVOjvVDKCOM/sXfCNJK0OzDwWF1zdP9hXl3hsfMBeSqtHkr8jU JESu9UqQT1W3SffiLCWj41g2AZ1htGrcui4nxaRQ4NPZunXR8R/L+kq9+qn+WF+VYPWu chZh6+R0Faag2eUkrnsByK+P+BEXb++nmhCcS6MW5yaiIHsB1UAoei2IVGP6rR+dyYPE 9EXg== X-Gm-Message-State: AOJu0YwSyk1LiLuMHXDBbpk5TtXPRpZMpUBi2WW4r5eskefnxfUIen8y AXqGLyQyxVecTYb0kgPglK/77wY5CWGg1GHINmxVEdZYm4fQBlgGYKuNOtogwCMwQItkcZbRSgE vZvHJIbLcFTsuLITnvHELBTqL6DGxzheaxNIHKXbW08OJuntEfMg= X-Google-Smtp-Source: AGHT+IEIXp6NBBBl8bY80RipofJmFPboXYmKBef8Ge5owa4lp5Ulg9zIo3wF75KGDFLk/+B94RTyhIXV9+m57jHb26A= X-Received: by 2002:a17:90a:43e5:b0:2bd:d4a0:7fce with SMTP id 98e67ed59e1d1-2c1dc576722mr6328151a91.20.1717373118090; Sun, 02 Jun 2024 17:05:18 -0700 (PDT) List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Sun, 2 Jun 2024 20:05:06 -0400 Message-ID: Subject: Re: removing support for kernel stack swapping To: Mark Johnston Cc: "freebsd-arch@freebsd.org" Content-Type: multipart/alternative; boundary="000000000000a6a9d30619f116d0" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4VsvB74SpVz4JPl --000000000000a6a9d30619f116d0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Jun 2, 2024, 5:57=E2=80=AFPM Mark Johnston wrot= e: > FreeBSD will, when free pages are scarce, try to swap out the kernel > stacks (typically 16KB per thread) of sleeping user threads. I'm told > that this mechanism was first implemented in BSD for the VAX port and > that stabilizing it was quite an endeavour. > > This feature has wide-ranging implications for code in the kernel. For > instance, if a thread allocates a structure on its stack, links it into > some data structure visible to other threads, and goes to sleep, it must > use PHOLD to ensure that the stack doesn't get swapped out while > sleeping. A missing PHOLD can thus result in a kernel panic, but this > kind of mistake is very easy to make and hard to catch without thorough > stress testing. The kernel stack allocator also requires a fair bit of > code to implement this feature, and we've had multiple bugs in that > area, especially in relation to NUMA support. Moreover, this feature > will leave threads swapped out after the system has recovered, resulting > in high scheduling latency once they're ready to run again. > > In a very stressed system, it's possible that we can free up something > like 1MB of RAM using this mechanism. I argue that this mechanism is > not worth it on modern systems: it isn't going to make the difference > between a graceful recovery from memory pressure and a catatonic state > which forces a reboot. The complexity and resulting bugs it induces is > not worth it. > +1. The smallest bootable system for me is like 256MB, and in a system like that it might save 256k given the number of threads typical in a system like that... Warner At the BSDCan devsummit I proposed removing support for kernel stack > swapping and got only positive feedback. Does anyone here have any > comments or objections? > > --000000000000a6a9d30619f116d0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sun, Jun 2, 2024, 5:57=E2=80=AFPM Mark Johnston <= ;markj@freebsd.org> wrote:
<= /div>
FreeBSD will, when free pages are scarc= e, try to swap out the kernel
stacks (typically 16KB per thread) of sleeping user threads.=C2=A0 I'm = told
that this mechanism was first implemented in BSD for the VAX port and
that stabilizing it was quite an endeavour.

This feature has wide-ranging implications for code in the kernel.=C2=A0 Fo= r
instance, if a thread allocates a structure on its stack, links it into
some data structure visible to other threads, and goes to sleep, it must use PHOLD to ensure that the stack doesn't get swapped out while
sleeping.=C2=A0 A missing PHOLD can thus result in a kernel panic, but this=
kind of mistake is very easy to make and hard to catch without thorough
stress testing.=C2=A0 The kernel stack allocator also requires a fair bit o= f
code to implement this feature, and we've had multiple bugs in that
area, especially in relation to NUMA support.=C2=A0 Moreover, this feature<= br> will leave threads swapped out after the system has recovered, resulting in high scheduling latency once they're ready to run again.

In a very stressed system, it's possible that we can free up something<= br> like 1MB of RAM using this mechanism.=C2=A0 I argue that this mechanism is<= br> not worth it on modern systems: it isn't going to make the difference between a graceful recovery from memory pressure and a catatonic state
which forces a reboot.=C2=A0 The complexity and resulting bugs it induces i= s
not worth it.


+1.=C2=A0
<= br>
The smallest bootable system for me is like 256M= B, and in a system like that it might save 256k given the number of threads= typical in a system like that...

Warner

At the BSDCan devsummit I proposed removing support for kernel stack
swapping and got only positive feedback.=C2=A0 Does anyone here have any comments or objections?

--000000000000a6a9d30619f116d0--