Re: removing support for kernel stack swapping

From: Ruslan Bukin <br_at_bsdpad.com>
Date: Mon, 03 Jun 2024 09:30:29 UTC
On Sun, Jun 02, 2024 at 08:05:06PM -0400, Warner Losh wrote:
> On Sun, Jun 2, 2024, 5:57 PM Mark Johnston <markj@freebsd.org> wrote:
> 
> > FreeBSD will, when free pages are scarce, try to swap out the kernel
> > stacks (typically 16KB per thread) of sleeping user threads.  I'm told
> > that this mechanism was first implemented in BSD for the VAX port and
> > that stabilizing it was quite an endeavour.
> >
> > This feature has wide-ranging implications for code in the kernel.  For
> > instance, if a thread allocates a structure on its stack, links it into
> > some data structure visible to other threads, and goes to sleep, it must
> > use PHOLD to ensure that the stack doesn't get swapped out while
> > sleeping.  A missing PHOLD can thus result in a kernel panic, but this
> > kind of mistake is very easy to make and hard to catch without thorough
> > stress testing.  The kernel stack allocator also requires a fair bit of
> > code to implement this feature, and we've had multiple bugs in that
> > area, especially in relation to NUMA support.  Moreover, this feature
> > will leave threads swapped out after the system has recovered, resulting
> > in high scheduling latency once they're ready to run again.
> >
> > In a very stressed system, it's possible that we can free up something
> > like 1MB of RAM using this mechanism.  I argue that this mechanism is
> > not worth it on modern systems: it isn't going to make the difference
> > between a graceful recovery from memory pressure and a catatonic state
> > which forces a reboot.  The complexity and resulting bugs it induces is
> > not worth it.
> >
> 
> 
> +1.
> 
> The smallest bootable system for me is like 256MB, and in a system like
> that it might save 256k given the number of threads typical in a system
> like that...
> 
> Warner
> 

I managed to boot on 10mb of on-chip static RAM (no DDR at all), including a few mb of mdroot. But now mostly using DDR2/3 which is no way to get less that 32mb, so 1mb is not a problem at all.

Ruslan