From nobody Mon Jun 03 21:41:22 2024 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VtRxr0F3wz5LKY7 for ; Mon, 03 Jun 2024 21:41:36 +0000 (UTC) (envelope-from jrtc27@jrtc27.com) Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VtRxq4cjBz4sm9 for ; Mon, 3 Jun 2024 21:41:35 +0000 (UTC) (envelope-from jrtc27@jrtc27.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-4210aa012e5so3402025e9.0 for ; Mon, 03 Jun 2024 14:41:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717450894; x=1718055694; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZeAstU2Uji9LQryROmiGhfB1zPebee1GJiXnYS4Ch14=; b=jdgVQsMd9YpdjqBJrkOZYL34J+c/1GRs7xqeDdlHKqBri1YOS3P0MjN0ktzbIjdvXZ +Yau8lD3BCEzgmB9yoQB/lCQrnTYugUFXg1+kKsDb7NuHX3hGJBvqMiJR+7d1Bmze2V+ vPq/KVKtTgVJDJAUPJ54MY3RB0YKhctg8pNI4CPccqy69XBGtJnPI3us2dVzcvIdAbeK ozzYus4tsj84G2lmZDLMCW5CgROKTboCjuea+aOYxNeMGd24Lo1mzrbaGtTHyTEJoRo/ E2QAPDX2ZLc2iLD4lyGRPN772u4T7ZgOOJL4/piCNvLEZ+FgY9BBYRIWg8x/yiZ0B1Ys DucA== X-Forwarded-Encrypted: i=1; AJvYcCWXJwiQQPGaLo/udDQC/yv6moEHnM7ZQuWgcC4MZy0H1MW2DjwqtFBhqsen9fphm056lawSTeiqsllyUItgCEXM+pAU4j2Cur4= X-Gm-Message-State: AOJu0Yz0yS0PhA2IDzysXPgS5+g0bWCGoX28LIXUDQ2XbqvdoS6onx/Y wOfIolppCqYNIRXQ1e8iWbseTTy2Rcv/eUSPK4ybB7uiLJOaOCLy5hHHok6N59E= X-Google-Smtp-Source: AGHT+IFGNpci4T84GmQwxJxTkeW6/+TVPUAWs/EgXwIpwo7p5OcbRvE3WWmFnL2mV6VcB+/iHyj8gQ== X-Received: by 2002:a05:600c:3155:b0:421:29b4:532a with SMTP id 5b1f17b1804b1-4212e05ef29mr88763465e9.16.1717450893866; Mon, 03 Jun 2024 14:41:33 -0700 (PDT) Received: from smtpclient.apple ([131.111.5.246]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4212b84ae09sm132429735e9.18.2024.06.03.14.41.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jun 2024 14:41:33 -0700 (PDT) Content-Type: text/plain; charset=us-ascii List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\)) Subject: Re: removing support for kernel stack swapping From: Jessica Clarke In-Reply-To: Date: Mon, 3 Jun 2024 22:41:22 +0100 Cc: Mark Johnston , freebsd-arch Content-Transfer-Encoding: 7bit Message-Id: References: To: Konstantin Belousov X-Mailer: Apple Mail (2.3774.500.171.1.1) X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US] X-Rspamd-Queue-Id: 4VtRxq4cjBz4sm9 On 3 Jun 2024, at 22:39, Konstantin Belousov wrote: > > On Mon, Jun 03, 2024 at 10:15:15PM +0100, Jessica Clarke wrote: >> On 3 Jun 2024, at 22:11, Konstantin Belousov wrote: >>> >>> On Sun, Jun 02, 2024 at 07:57:04PM -0400, Mark Johnston wrote: >>>> FreeBSD will, when free pages are scarce, try to swap out the kernel >>>> stacks (typically 16KB per thread) of sleeping user threads. I'm told >>>> that this mechanism was first implemented in BSD for the VAX port and >>>> that stabilizing it was quite an endeavour. >>>> >>>> This feature has wide-ranging implications for code in the kernel. For >>>> instance, if a thread allocates a structure on its stack, links it into >>>> some data structure visible to other threads, and goes to sleep, it must >>>> use PHOLD to ensure that the stack doesn't get swapped out while >>>> sleeping. A missing PHOLD can thus result in a kernel panic, but this >>>> kind of mistake is very easy to make and hard to catch without thorough >>>> stress testing. The kernel stack allocator also requires a fair bit of >>>> code to implement this feature, and we've had multiple bugs in that >>>> area, especially in relation to NUMA support. Moreover, this feature >>>> will leave threads swapped out after the system has recovered, resulting >>>> in high scheduling latency once they're ready to run again. >>>> >>>> In a very stressed system, it's possible that we can free up something >>>> like 1MB of RAM using this mechanism. I argue that this mechanism is >>>> not worth it on modern systems: it isn't going to make the difference >>>> between a graceful recovery from memory pressure and a catatonic state >>>> which forces a reboot. The complexity and resulting bugs it induces is >>>> not worth it. >>> On amd64, 1MB of physical memory for stacks is consumed by 64k threads, >> >> To avoid any confusion, you mean 64 kthreads here, right? At least that >> makes sense for the story and the maths. > I mean 65535 threads (each of which must have kernel stack). At 16 KiB each that would be 1 GiB total, not 1 MiB? Jess