removing support for kernel stack swapping

Reply: Warner Losh : "Re: removing support for kernel stack swapping"
Reply: Poul-Henning Kamp: "Re: removing support for kernel stack swapping"
Reply: Konstantin Belousov : "Re: removing support for kernel stack swapping"
Reply: John Baldwin : "Re: removing support for kernel stack swapping"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: Mark Johnston <markj_at_freebsd.org>
Date: Sun, 02 Jun 2024 23:57:04 UTC

FreeBSD will, when free pages are scarce, try to swap out the kernel
stacks (typically 16KB per thread) of sleeping user threads.  I'm told
that this mechanism was first implemented in BSD for the VAX port and
that stabilizing it was quite an endeavour.

This feature has wide-ranging implications for code in the kernel.  For
instance, if a thread allocates a structure on its stack, links it into
some data structure visible to other threads, and goes to sleep, it must
use PHOLD to ensure that the stack doesn't get swapped out while
sleeping.  A missing PHOLD can thus result in a kernel panic, but this
kind of mistake is very easy to make and hard to catch without thorough
stress testing.  The kernel stack allocator also requires a fair bit of
code to implement this feature, and we've had multiple bugs in that
area, especially in relation to NUMA support.  Moreover, this feature
will leave threads swapped out after the system has recovered, resulting
in high scheduling latency once they're ready to run again.

In a very stressed system, it's possible that we can free up something
like 1MB of RAM using this mechanism.  I argue that this mechanism is
not worth it on modern systems: it isn't going to make the difference
between a graceful recovery from memory pressure and a catatonic state
which forces a reboot.  The complexity and resulting bugs it induces is
not worth it.

At the BSDCan devsummit I proposed removing support for kernel stack
swapping and got only positive feedback.  Does anyone here have any
comments or objections?