From nobody Fri Oct 25 19:54:32 2024 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XZtls5RJSz5b5VJ for ; Fri, 25 Oct 2024 19:54:33 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XZtls4Ngkz4kp5; Fri, 25 Oct 2024 19:54:33 +0000 (UTC) (envelope-from jhb@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1729886073; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Bul2wISmCTD1U4lqCbqUZy3eyOZ4PW1MmJndFhyeu+A=; b=u4HU/+tlOm4UichxzgF9iYoitt6X/uVEa6Y/YdoM+WcbuQh8Zy1jv/modiQekoLUZ8pE/F tW3UtsOJpxOB4pVzvNNnMEWbpJL540iewIpeFe08CJS6LOf4KlszFyhrNLTZwsJx+HitKP HPCwfg/A+Tc824UF6hoqiqggJVnMLMGscesoqWatAf1WwMRB0BdYJKb6XY3SHtZP9ylIX0 CMymsGDfVOMiSinI4cofF0ki/9D4mCFFJEYcBypZ0KyM28mJuL+NUuNcN+a9jfeyDgGeHp WSjQrME1MkkeIbEgW2QrcoV5LnkYLeJ0r3Q2VaKUKKsf5mNeBOFIrOhC5OzBSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1729886073; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Bul2wISmCTD1U4lqCbqUZy3eyOZ4PW1MmJndFhyeu+A=; b=rlimFURuItE4ACJrxRUokme+kOYj+UWGu8cO1stPmMLZw3vwuSebcsF4oedJgxMzCEDe/w eyxQRj63LEL5gYZFauTm5Bs6BfpA4QZtIwwOS2fsMOEwD6UjPLg0Rt1WG3/sKhRXSUEeBn Fdb+M6/3Kxyn3EWCbPVfj0r8mzOuPyK9IYR40alZg6MLTyfoa2sV3pQ9WKGJX84AIPmlfw YGr1k6z+MaxoG7WlQWo0ksdamX6xf8UKt1D2t+ycmw+3vIaaMvtekPaa44wP7iapYmE5Py HK4OjJy8y4MLt0fTQWug0oI/cgOOU4C+i44FhlgszzomFpAlAH6bi8ejYm4xGQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1729886073; a=rsa-sha256; cv=none; b=ZqqPySPh6a06krO8h4i9lwcmsGiQ+PMoAlayX4LOiK7jrBWGlF2KkRXxbYUBihEcrtYvHn 2DAX0DOAR8XdEmaHcrMzKRzVzSBEkiAXrGKD9NdOGM69mjaTzLJMm3YPa7g7mI8xqpHtFn hVyLHKBd88tBhpKPZQwWDXBquuL0OVbS9kIKyva9k38AQ42sHrYmetjPKwVD3Jl+9w+Wdf NilBKdfRllyTTK/f3Y07161CsWIJv6FfEiY8ZtOkdGU5je1tlPz54Ll2yilHI5xtpGkKEr M90ss6Yw3gtaTlKQjjcq/4kTXLGNZvJO4CvzfJr1W0CHmLfrJd0WHK/3v+X12Q== Received: from [IPV6:2601:5c0:4200:b830:29fa:4c20:9528:c6fc] (unknown [IPv6:2601:5c0:4200:b830:29fa:4c20:9528:c6fc]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id 4XZtls34Q4z17wx; Fri, 25 Oct 2024 19:54:33 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Message-ID: Date: Fri, 25 Oct 2024 15:54:32 -0400 List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Building kernels with FPU support? Content-Language: en-US To: gnn , Arch References: From: John Baldwin In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 10/23/24 10:38, gnn wrote: > Howdy, > > I am wondering if anyone has tried, lately, to see what effect building with FPU support has on overall system performance. I've been working with a kernel module that needs this (for reasons I'll not go into now) and it occurred to me that the perceived performance overhead that caused us to only do fixed point in the kernel may no longer be significant. I note that Linux has an option to build their kernel with FPU support. > > And yes, I know that we have the ability to selectively deal with the FPU, from the calls outlined in Section 9 for fpu, but I'm asking the more general question of "does it matter?" and "if so, how much?" To enable vector instructions "in general" in the kernel means that every trap would need to save the floating point state. Basically, struct trapframe would need to save all the vector/FP register state in addition to GPRs. You would also need to save/restore it when switching threads in the kernel. In essence, the current per-pcb state we have now would stay, but would hold in-kernel state, and userspace state would end up in the trapframe from userspace. This would probably be quite expensive. Saving and restoring FPU state is not cheap and we would now be doing that on every entry/exit into the kernel (so extra overhead on system calls, faults, and interrupts). It would also probably blow out kernel stack usage quite a bit. The XSAVE region on modern x86 processors is already close to 2k and is only growing. That would be a substantially larger trapframe and require larger kstacks as a result. To mitigate the latter you could perhaps try to only use FP in the kernel "top-half" and not use it in bottom-half interrupt code. I worry a bit about clearly demarking bottom-half code to still compile without FP, but as long as you disable FP access for nested faults you'd find any inconsistencies there rather quickly in the form of panics. Certainly it would be a fair bit of work to prototype to see what happens. Some other things you could try are to only save a subset of register state for traps (e.g. just FXSAVE on x86 would mean you can use SSE and FP, but not AVX which might be enough for the many use cases in the kernel while not blowing out quite as much stack space). -- John Baldwin