From nobody Tue Sep 10 12:59:02 2024 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X33gR2X2Yz5WT26; Tue, 10 Sep 2024 12:59:15 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4X33gR1bPhz487P; Tue, 10 Sep 2024 12:59:15 +0000 (UTC) (envelope-from theraven@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1725973155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=52WhLO+OOHHelT2Ulj2UpeO8k0p2xywwOt9PKUFdeUE=; b=Z9leBlhmFM1wChnwg8J+UYDNYI79vTXSLsfTgdJJEXM21aXDvdIedF4jMih1Sc8fZNkLAO jhV7+IsLEg0wk00SVmpRYo4hj5EjlJt8MXXRzg7G4fzPq73nCIeqs5YYYKkh+igF1vZi/5 L804Mq/G8taMah23Et0VIZ8OH9ZiIJT4EkD2enMxWmakB8Yl60PcQHeG5O0KoM6lHVnS3L U36iIZ578yTmEDfaK2R3+FvYTRsHN7NyvqH1Uo7pJ5NRKtlu+2jBWsSiHE4UB/Xrv5JGuH c/zGstZ5ZbIly10xanfW2yAN0X3jA47l/ilTzoA0xMqMxh1uhA9p3VyLkt9BQA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1725973155; a=rsa-sha256; cv=none; b=EPHvqxYtQCoIHtuF6ELe6d98c6hcy6ZTAffa6LfPF4ZEMaKjVZMM7FS/yaDtPF0U/9Wj/T zEnYrlAs0rsdrRQE4oxgxcknTcqFUe5V893KkgDF2Iid22HDVvC6nn/41n9u6qcr/gix1V 7C5uEzfDERYUr8ymKHvFFsvCqBKdsd6nRJftshmxD5eFcn25dBEW3EJPey7ZxnU9KZm5us 1j1z3rGA2LsBNm8OZtraxZ+/nOd0LeGbp30vj/UM2i/W8mOd/6lEBh8Y/V7UvNh+MxdmLT PsMGCSg8rcMkl5a9uxP+xXZCEomHbYR871vc/onLG8rGWUOz10WyaE5/U8+/mw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1725973155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=52WhLO+OOHHelT2Ulj2UpeO8k0p2xywwOt9PKUFdeUE=; b=XPXxoJsZW7hScfl6faNRdizk8iNIMDcmwRx20lDNX2XGUlHp7BNFeCsT5OyHfcG+EGsbs3 9LUlhAqs+VDtS85Akd351XwffW8FkkfXjezE+XD9L+PhNGKOjjwldUvvlUMpK4Omz3kRWP PRl9pCTfsKazovqSwBGfUHzjTJaKQonVzi9G9bG5/hK8J1ezTaCzYdXHcWdLEXI/KvpwNy SAe0fogKCY8Q6ZUzFQ0Thkx8I4LJ7q81fFqJcO9mNW90aerrwz57g2K7MtrGRSQQ5be2Uk 9/ntKjaddeVAB1YIJNtDQRrAdtUHdQyBqDQCnCMNQEw9FeNO63xxLUOCJfOC4A== Received: from smtp.theravensnest.org (smtp.theravensnest.org [45.77.103.195]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: theraven) by smtp.freebsd.org (Postfix) with ESMTPSA id 4X33gR10JPzRMq; Tue, 10 Sep 2024 12:59:15 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from smtpclient.apple (host109-155-136-107.range109-155.btcentralplus.com [109.155.136.107]) by smtp.theravensnest.org (Postfix) with ESMTPSA id A6A0065B5; Tue, 10 Sep 2024 13:59:13 +0100 (BST) From: David Chisnall Message-Id: <4D84AF55-51C7-4C2B-94F7-D486A29E8821@FreeBSD.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_568E13D8-1F5C-410F-B911-1402B36B059B" List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: Re: BPF64: proposal of platform-independent hardware-friendly backwards-compatible eBPF alternative Date: Tue, 10 Sep 2024 13:59:02 +0100 In-Reply-To: <20240910144557.4d95052a@nuclight.lan> Cc: Poul-Henning Kamp , tcpdump-workers@lists.tcpdump.org, "freebsd-arch@freebsd.org" , "freebsd-hackers@freebsd.org" , "freebsd-net@freebsd.org" , "tech-net@netbsd.org" , Alexander Nasonov To: Vadim Goncharov References: <20240910040544.125245ad@nuclight.lan> <202409100638.48A6cor2090591@critter.freebsd.dk> <20240910144557.4d95052a@nuclight.lan> X-Mailer: Apple Mail (2.3776.700.51) --Apple-Mail=_568E13D8-1F5C-410F-B911-1402B36B059B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On 10 Sep 2024, at 12:45, Vadim Goncharov = wrote: >=20 > It's easy for your Lua code (or whatever) code to hang kernel by > infinite loop. Or crash it by access on arbitrary pointer. That's why > original BPF has no backward jumps and memory access, and eBPF's > nightmare verifier walks all code paths and check pointers. I=E2=80=99m not convinced by the second: Lua has a GC=E2=80=99d heap, = you=E2=80=99d need to expose FFI things to it that did unsafe things, = and that=E2=80=99s equally a problem for eBPF. The first is not a problem. The Lua interpreter has a bytecode limit. = You can define a bounded number of bytecodes that it will execute. The = problem comes from the standard library. Things like string.gmatch can = have high-order polynomial complexity and so it=E2=80=99s possible for a = Lua program that executes a small number of bytecodes to create a string = that takes a vast amount of time to match on. Again, this is also a = problem for eBPF if you expose a similar function, the solution is to = not expose functions with large data-dependent runtimes to untrusted = script. More generally, there are a lot of problems with interpreting or JITing = untrusted code in the kernel in *any* runtime. Speculative execution = makes it easy to use these as primitives to leak kernel secrets, either = via timing of the programs themselves, using the JIT to generate = gadgets, or by leaking data via cache priming. Both eBPF and Lua have these problems. The thing I would like to see for our current use of semi-trusted Lua in = the kernel (ZFS channel programs) is a way of exposing them (under = /dev/something) as file descriptors and modifying the ioctls that run = them to take a file descriptor argument. I would like to separate the = two operations: - Load a channel program. - Run a channel program. In the post-Spectre world, the former remains a privileged operation. = Even though Linux pretends it isn=E2=80=99t, allowing arbitrary (even = arbitrary constrained) code to run in the kernel=E2=80=99s address space = is a problem. Invoking such code; however, should follow the same rules = as everything else. A trusted entity should be able to load a pile of = Lua / eBPF / BPF64 / whatever programs into the kernel and then set up = permissions so that sandboxed programs (and jails) can use a defined = subset of them. David --Apple-Mail=_568E13D8-1F5C-410F-B911-1402B36B059B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 On 10 Sep = 2024, at 12:45, Vadim Goncharov <vadimnuclight@gmail.com> = wrote:

It's easy for your Lua code (or whatever) = code to hang kernel by
infinite loop. Or crash it by access on arbitrary pointer. = That's why
original BPF has no backward jumps and memory access, and = eBPF's
nightmare verifier walks all code paths and check = pointers.

I=E2=80=99m not = convinced by the second: Lua has a GC=E2=80=99d heap, you=E2=80=99d need = to expose FFI things to it that did unsafe things, and that=E2=80=99s = equally a problem for eBPF.

The first is not a = problem.  The Lua interpreter has a bytecode limit.  You can = define a bounded number of bytecodes that it will execute.  The = problem comes from the standard library.  Things like string.gmatch = can have high-order polynomial complexity and so it=E2=80=99s possible = for a Lua program that executes a small number of bytecodes to create a = string that takes a vast amount of time to match on.  Again, this = is also a problem for eBPF if you expose a similar function, the = solution is to not expose functions with large data-dependent runtimes = to untrusted script.

More generally, there are = a lot of problems with interpreting or JITing untrusted code in the = kernel in *any* runtime.  Speculative execution makes it easy to = use these as primitives to leak kernel secrets, either via timing of the = programs themselves, using the JIT to generate gadgets, or by leaking = data via cache priming.

Both eBPF and Lua have = these problems.

The thing I would like to see = for our current use of semi-trusted Lua in the kernel (ZFS channel = programs) is a way of exposing them (under /dev/something) as file = descriptors and modifying the ioctls that run them to take a file = descriptor argument.  I would like to separate the two = operations:

 - Load a channel = program.
 - Run a channel = program.

In the post-Spectre world, the former = remains a privileged operation.  Even though Linux pretends it = isn=E2=80=99t, allowing arbitrary (even arbitrary constrained) code to = run in the kernel=E2=80=99s address space is a problem.  Invoking = such code; however, should follow the same rules as everything else. =  A trusted entity should be able to load a pile of Lua / eBPF / = BPF64 / whatever programs into the kernel and then set up permissions so = that sandboxed programs (and jails) can use a defined subset of = them.

David

= --Apple-Mail=_568E13D8-1F5C-410F-B911-1402B36B059B--