From nobody Mon Jun 03 21:15:15 2024 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VtRMj1ljTz5LHZX for ; Mon, 03 Jun 2024 21:15:29 +0000 (UTC) (envelope-from jrtc27@jrtc27.com) Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VtRMh6xNHz4qsX for ; Mon, 3 Jun 2024 21:15:28 +0000 (UTC) (envelope-from jrtc27@jrtc27.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-4210aa012e5so3163685e9.0 for ; Mon, 03 Jun 2024 14:15:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717449327; x=1718054127; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cbTIarJz2Ls7NHVd8/whxf5nrUfyxtuqQ68XIN81+Ko=; b=ijcV9V4pvJqmdKko3Je6/KuC+KVYg9cFOW9bK8WXctjTfGWMAOimHQax8vcMFt1VZx ZF2PGGVEDebu5ej2+6DihyTO+6WNrYX4P3D7lL3CaPQNewWn3NaZBfQrw8ZnmKpMTrID DbDSMyxIdniJpEZfwFgnkAskDmvb/Sx16lzDewMspX1UjtHi/HXzJzDNCd/bqgXMPxCb r18hSk2r6OK3BTCYU3PKOquftsiWn36C2s6pVso7Yp2dONS/+w++hrnHnnDDFqU4aU43 6VZRMcw67HORomoBeUJUI0kF3lRW6pEPvQVQejPeQXrpcmMKGyp+AfN3QzpAuaC2BKR4 KNCQ== X-Forwarded-Encrypted: i=1; AJvYcCXnZ8FEY05fq9vHWTlm7eAf51CxNorO9n4Q5Yov13moW90IgoMHHHQunDd0SlSMtXifb/OktrejIQWC3EefaJJYhhCnwH5+HXk= X-Gm-Message-State: AOJu0Yx4AHFpDhdSoiDbVpPNOoGbbEg9oReI1bAWLVlAdjlk1I004PXh 72EPdVBkJSn5D49vxdp0Dw0vF+7n3wreD7KxXZ7zspgGXL4L7RiiY9anPeK/fLE= X-Google-Smtp-Source: AGHT+IFzuwzT0uG64qlDk2C+jA0LoymUOA7hQGhXV9Xf33GEGkzZITSbwzyN/CNWlBTYrDTdWQJgHg== X-Received: by 2002:a05:600c:4683:b0:420:309a:fe63 with SMTP id 5b1f17b1804b1-4212e0766a4mr84636215e9.22.1717449326969; Mon, 03 Jun 2024 14:15:26 -0700 (PDT) Received: from smtpclient.apple ([131.111.5.246]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4213e75eef3sm41154625e9.6.2024.06.03.14.15.26 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jun 2024 14:15:26 -0700 (PDT) Content-Type: text/plain; charset=us-ascii List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\)) Subject: Re: removing support for kernel stack swapping From: Jessica Clarke In-Reply-To: Date: Mon, 3 Jun 2024 22:15:15 +0100 Cc: Mark Johnston , freebsd-arch Content-Transfer-Encoding: 7bit Message-Id: References: To: Konstantin Belousov X-Mailer: Apple Mail (2.3774.500.171.1.1) X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US] X-Rspamd-Queue-Id: 4VtRMh6xNHz4qsX On 3 Jun 2024, at 22:11, Konstantin Belousov wrote: > > On Sun, Jun 02, 2024 at 07:57:04PM -0400, Mark Johnston wrote: >> FreeBSD will, when free pages are scarce, try to swap out the kernel >> stacks (typically 16KB per thread) of sleeping user threads. I'm told >> that this mechanism was first implemented in BSD for the VAX port and >> that stabilizing it was quite an endeavour. >> >> This feature has wide-ranging implications for code in the kernel. For >> instance, if a thread allocates a structure on its stack, links it into >> some data structure visible to other threads, and goes to sleep, it must >> use PHOLD to ensure that the stack doesn't get swapped out while >> sleeping. A missing PHOLD can thus result in a kernel panic, but this >> kind of mistake is very easy to make and hard to catch without thorough >> stress testing. The kernel stack allocator also requires a fair bit of >> code to implement this feature, and we've had multiple bugs in that >> area, especially in relation to NUMA support. Moreover, this feature >> will leave threads swapped out after the system has recovered, resulting >> in high scheduling latency once they're ready to run again. >> >> In a very stressed system, it's possible that we can free up something >> like 1MB of RAM using this mechanism. I argue that this mechanism is >> not worth it on modern systems: it isn't going to make the difference >> between a graceful recovery from memory pressure and a catatonic state >> which forces a reboot. The complexity and resulting bugs it induces is >> not worth it. > On amd64, 1MB of physical memory for stacks is consumed by 64k threads, To avoid any confusion, you mean 64 kthreads here, right? At least that makes sense for the story and the maths. Jess > which is not too stressed system. I remember that very long time ago > Peter ran tests with several hundreds of k threads, which is more realistic > high load, e.g. from typical java code (at least it was so several years > ago). > > For kernel stack to be swapped, normally thread must sleep for at least > 10 secs. so a latency for next thread running moment should be not too > important. > > Having 1MB of essentially free memory is nice for system survival. > Being able to swap out pcb as well could be useful, IMO. > >> >> At the BSDCan devsummit I proposed removing support for kernel stack >> swapping and got only positive feedback. Does anyone here have any >> comments or objections? >