Re: removing support for kernel stack swapping

From: Jessica Clarke <jrtc27_at_freebsd.org>
Date: Mon, 03 Jun 2024 21:41:22 UTC
On 3 Jun 2024, at 22:39, Konstantin Belousov <kostikbel@gmail.com> wrote:
> 
> On Mon, Jun 03, 2024 at 10:15:15PM +0100, Jessica Clarke wrote:
>> On 3 Jun 2024, at 22:11, Konstantin Belousov <kostikbel@gmail.com> wrote:
>>> 
>>> On Sun, Jun 02, 2024 at 07:57:04PM -0400, Mark Johnston wrote:
>>>> FreeBSD will, when free pages are scarce, try to swap out the kernel
>>>> stacks (typically 16KB per thread) of sleeping user threads.  I'm told
>>>> that this mechanism was first implemented in BSD for the VAX port and
>>>> that stabilizing it was quite an endeavour.
>>>> 
>>>> This feature has wide-ranging implications for code in the kernel.  For
>>>> instance, if a thread allocates a structure on its stack, links it into
>>>> some data structure visible to other threads, and goes to sleep, it must
>>>> use PHOLD to ensure that the stack doesn't get swapped out while
>>>> sleeping.  A missing PHOLD can thus result in a kernel panic, but this
>>>> kind of mistake is very easy to make and hard to catch without thorough
>>>> stress testing.  The kernel stack allocator also requires a fair bit of
>>>> code to implement this feature, and we've had multiple bugs in that
>>>> area, especially in relation to NUMA support.  Moreover, this feature
>>>> will leave threads swapped out after the system has recovered, resulting
>>>> in high scheduling latency once they're ready to run again.
>>>> 
>>>> In a very stressed system, it's possible that we can free up something
>>>> like 1MB of RAM using this mechanism.  I argue that this mechanism is
>>>> not worth it on modern systems: it isn't going to make the difference
>>>> between a graceful recovery from memory pressure and a catatonic state
>>>> which forces a reboot.  The complexity and resulting bugs it induces is
>>>> not worth it.
>>> On amd64, 1MB of physical memory for stacks is consumed by 64k threads,
>> 
>> To avoid any confusion, you mean 64 kthreads here, right? At least that
>> makes sense for the story and the maths.
> I mean 65535 threads (each of which must have kernel stack).

At 16 KiB each that would be 1 GiB total, not 1 MiB?

Jess