From nobody Tue Sep 10 13:44:47 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X34h56SKpz5WYnp; Tue, 10 Sep 2024 13:44:53 +0000 (UTC) (envelope-from vadimnuclight@gmail.com) Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4X34h54cG5z4Lj6; Tue, 10 Sep 2024 13:44:53 +0000 (UTC) (envelope-from vadimnuclight@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-5344ab30508so6086811e87.0; Tue, 10 Sep 2024 06:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725975892; x=1726580692; darn=freebsd.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=K+m/jXapYzzXBOdJT5EyQSuIHmGF1QSU2be+dgGMMww=; b=T6GVpJhbQSx6/URyN6PPzxi7ypP0zKtbikTBd0H5C9buGHazTYPowuBPGKO/hFk4XP RmK/tIa7ObKa8Ohx4JGDYXaa+Jz1Y5aoIRWI0rl6eFtvhsQYVVpxYIoA3Iwe98Ut0cgL /5QYuXkpwOJwWPr9WeedzPTRZyY06sISA0YgUrKv4jvxdYHXxkldsD7nv6HKARCGn5Zn s6L5/t8AwBJZRgrJMW+xVatWop+XA9RqEE3VSa1hJN1+3Hj1YmFttByuqAD2w8ZEbzhK aBdqAzDR0msKUEkRr3jGk/JXcJ9rV0Fr2ak77Ze2Lzxz68vg5VGO4fJMGOA8iMBMBu8n jIPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725975892; x=1726580692; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K+m/jXapYzzXBOdJT5EyQSuIHmGF1QSU2be+dgGMMww=; b=eRnu3sc67XvxB61xInDithI9dlO0j2b92NPLWxkctx8zAPd6xp4RJxDLRVip3RxfP6 uQTKK/PSTJmL2XGULIRODsPb+sI48Ts+O/vwG9VtzwdELuEX81LDBbECeaJUTQ52fABo uq+KdjibZh8/FbVGQMBo87Xqh9cGNf/7bjvnphJPYRCmc8+FH19j9bwM2oyQtNm6Hfff GxtFRAB8nG291wyN/KTUG0EaesxBaK9QLZB8eUbfOKyyjgQhWuCpTjQvqDiDGvm8drP9 GjnC+SL8Cx1lE2ifxIPuTJh9OxkhjXMhWLFcxw86nHIC0RqdFXIPErRd+zuKVNx1Orda 47cg== X-Forwarded-Encrypted: i=1; AJvYcCUh5bIP+W0j+lThDKRqnVFX9x8FIb07Panuy+C3XdokX/T8X2q/mJq7DKNphg2I9rIqiEIPGYGEDPhOnEU=@freebsd.org, AJvYcCUy8S3ov+JBxkFjigWHd305wMfVfsuPznmYyOkLh9+PdLwyTSH2OU/BeYpsyoPCYKs3b1w5JnexmYI+TuE5TMDK@freebsd.org, AJvYcCVvzCBXJs5H8FmmzA8hVsotBtIkuFrRJ1aAIyL3COpv40nNt18rEFLXz+gk7C25VunqVD2KncbxTXpy2AA=@freebsd.org X-Gm-Message-State: AOJu0YzZlyIwa40T9zfgx94clsP5S/v0mkdPfW2ce3f2u95Rf6xMdVxO p3s/0aKZVSJMnEx21v4p+YmUiLYYYJ1iheTUU9Fkmx8t6gX+DdTM4C522jRD X-Google-Smtp-Source: AGHT+IF1I+H9IpV3UVoCM8UL/JKt5Oi6jnQ8OUUoZY6/7NCvz13NY+Q89FJXXtqaMIAOsUoRddtaMw== X-Received: by 2002:a05:6512:3e0e:b0:536:628d:20e with SMTP id 2adb3069b0e04-5366bb48633mr1099505e87.29.1725975891203; Tue, 10 Sep 2024 06:44:51 -0700 (PDT) Received: from nuclight.lan ([37.204.254.214]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5365f912ee5sm1181543e87.301.2024.09.10.06.44.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Sep 2024 06:44:50 -0700 (PDT) Date: Tue, 10 Sep 2024 16:44:47 +0300 From: Vadim Goncharov To: David Chisnall Cc: Poul-Henning Kamp , tcpdump-workers@lists.tcpdump.org, "freebsd-arch@freebsd.org" , "freebsd-hackers@freebsd.org" , "freebsd-net@freebsd.org" , "tech-net@netbsd.org" , Alexander Nasonov Subject: Re: BPF64: proposal of platform-independent hardware-friendly backwards-compatible eBPF alternative Message-ID: <20240910164447.30039291@nuclight.lan> In-Reply-To: <4D84AF55-51C7-4C2B-94F7-D486A29E8821@FreeBSD.org> References: <20240910040544.125245ad@nuclight.lan> <202409100638.48A6cor2090591@critter.freebsd.dk> <20240910144557.4d95052a@nuclight.lan> <4D84AF55-51C7-4C2B-94F7-D486A29E8821@FreeBSD.org> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; amd64-portbld-freebsd12.4) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4X34h54cG5z4Lj6 On Tue, 10 Sep 2024 13:59:02 +0100 David Chisnall wrote: > On 10 Sep 2024, at 12:45, Vadim Goncharov > wrote: > >=20 > > It's easy for your Lua code (or whatever) code to hang kernel by > > infinite loop. Or crash it by access on arbitrary pointer. That's > > why original BPF has no backward jumps and memory access, and eBPF's > > nightmare verifier walks all code paths and check pointers. =20 >=20 > I=E2=80=99m not convinced by the second: Lua has a GC=E2=80=99d heap, you= =E2=80=99d need to > expose FFI things to it that did unsafe things, and that=E2=80=99s equall= y a > problem for eBPF. Not quite. For eBPF (and BPF64) there must be not just FFI but special wrappers or even written from scratch functions keeping in mind they work for restricted environment. Lua, of course, does not have such thing - it will be needed to reimplement standard library. > The first is not a problem. The Lua interpreter has a bytecode > limit. You can define a bounded number of bytecodes that it will > execute. The problem comes from the standard library. Things like > string.gmatch can have high-order polynomial complexity and so it=E2=80= =99s > possible for a Lua program that executes a small number of bytecodes > to create a string that takes a vast amount of time to match on. > Again, this is also a problem for eBPF if you expose a similar > function, the solution is to not expose functions with large > data-dependent runtimes to untrusted script. In BPF64 some safety belts are supposed - e.g. on CALL/RET time is checked, and if exceeded, program is marked unsafe and disabled. > More generally, there are a lot of problems with interpreting or > JITing untrusted code in the kernel in *any* runtime. Speculative > execution makes it easy to use these as primitives to leak kernel > secrets, either via timing of the programs themselves, using the JIT > to generate gadgets, or by leaking data via cache priming. >=20 > Both eBPF and Lua have these problems. > [...] > - Run a channel program. >=20 > In the post-Spectre world, the former remains a privileged operation. > Even though Linux pretends it isn=E2=80=99t, allowing arbitrary (even > arbitrary constrained) code to run in the kernel=E2=80=99s address space = is a > problem. Invoking such code; however, should follow the same rules > as everything else. A trusted entity should be able to load a pile > of Lua / eBPF / BPF64 / whatever programs into the kernel and then > set up permissions so that sandboxed programs (and jails) can use a > defined subset of them. I am not an experience assembler user and don't understand how Spectre works - that's why I've written RFC letter even before spec finished - but isn't that (Spectre) an x86-specific thing? BPF64 has more registers and primarily target RISC architectures if we're speaking of JIT. For BPF64 I've did separate stack as register window exactly to mitigate ROP and it's gadgets. And BPF64 is meant as backwards-compatible extension of existing BPF, that is, it has bytecode interpreter (for(;;) switch/case) as primary form and JIT only then - thus e.g. JIT can be disabled for non-root users in case of doubt. eBPF can't do this - it always exists in native machine code form at execution, bytecode is only for verifier stage. ^^ that's fallback if you say "safe JIT is impossible", but may be you have advices on how to do architecture to still do it safe? As BPF64 looks doable improvement for us in much lower resource investment than even to *porting* eBPF to *BSD. > The thing I would like to see for our current use of semi-trusted Lua > in the kernel (ZFS channel programs) is a way of exposing them (under > /dev/something) as file descriptors and modifying the ioctls that run > them to take a file descriptor argument. I would like to separate > the two operations: >=20 > - Load a channel program. Didn't hear about, looked at the zfs-program(8) and see no reason why these are called "channel" programs (just to please some old farts?) and even reason for them to run in kernel, for same userland-utilities-achi= evable things, seems doubtful. --=20 WBR, @nuclight