Re: removing support for kernel stack swapping

From: Rodney W. Grimes <freebsd-rwg_at_gndrsh.dnsmgr.net>
Date: Mon, 03 Jun 2024 14:19:32 UTC
> FreeBSD will, when free pages are scarce, try to swap out the kernel
> stacks (typically 16KB per thread) of sleeping user threads.  I'm told
> that this mechanism was first implemented in BSD for the VAX port and
> that stabilizing it was quite an endeavour.
> 
> This feature has wide-ranging implications for code in the kernel.  For
> instance, if a thread allocates a structure on its stack, links it into
> some data structure visible to other threads, and goes to sleep, it must
> use PHOLD to ensure that the stack doesn't get swapped out while
> sleeping.  A missing PHOLD can thus result in a kernel panic, but this
> kind of mistake is very easy to make and hard to catch without thorough
> stress testing.  The kernel stack allocator also requires a fair bit of
> code to implement this feature, and we've had multiple bugs in that
> area, especially in relation to NUMA support.  Moreover, this feature
> will leave threads swapped out after the system has recovered, resulting
> in high scheduling latency once they're ready to run again.
> 
> In a very stressed system, it's possible that we can free up something
> like 1MB of RAM using this mechanism.  I argue that this mechanism is
> not worth it on modern systems: it isn't going to make the difference
> between a graceful recovery from memory pressure and a catatonic state
> which forces a reboot.  The complexity and resulting bugs it induces is
> not worth it.
> 
> At the BSDCan devsummit I proposed removing support for kernel stack
> swapping and got only positive feedback.  Does anyone here have any
> comments or objections?

My experience has been that any time the memory pressure gets so bad
that we start to swap out the idle process kernel stack the system
is pretty much useless anyway, so yes, please remove this.

As far as tiny systems like <256MB, your swap is going to be on something
horribly slow like TF/SD or eMMC and you probably dont want your system
swapping at all, either add memory or reorganize the work load.

-- 
Rod Grimes                                                 rgrimes@freebsd.org