From nobody Mon Jan 22 22:54:52 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TJlss3RdZz58NFq for ; Mon, 22 Jan 2024 22:54:57 +0000 (UTC) (envelope-from robert@rrbrussell.com) Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TJlsr5vgjz4YYZ for ; Mon, 22 Jan 2024 22:54:56 +0000 (UTC) (envelope-from robert@rrbrussell.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=rrbrussell.com header.s=fm2 header.b=XvzKjeCV; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="k gb9Axv"; dmarc=pass (policy=quarantine) header.from=rrbrussell.com; spf=pass (mx1.freebsd.org: domain of robert@rrbrussell.com designates 64.147.123.20 as permitted sender) smtp.mailfrom=robert@rrbrussell.com Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id E61A43200A60 for ; Mon, 22 Jan 2024 17:54:55 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Mon, 22 Jan 2024 17:54:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rrbrussell.com; h=cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1705964095; x=1706050495; bh=hI6VeRENRiuY0dSnaW1U7ZnkOY82O07buo4+uXytfgo=; b= XvzKjeCVJzh05/hSAWPUgFoICUde7QZRyU5aX/A3h285C5jqJ7HqfgOcX4tqu/yU LBsYRCWm07H8pTUzoI0Qm/C9Lbe8bEA91aUM1Hh6kgOKlC+Cf87IMVcAL3l9qesM YJeDBe3wadJm4dohm1NgrxzqbdSssiJByB3vNXlylcvm1bC7SV7Z42ULvcsYiTFS dqmNYmMb/WpXeEYRv2qLY1pByqCystFtOiRDtX1D2rGEGN1trRwSItPQ5gkksn/b FPQ4S6yruAB2XK2Y7t/Yk4ULSj8G8pGU9GDyISN1p0pagMczYg+0qg17Ml9TeVLi urDsJ36m5nYPvycbxoeQXA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1705964095; x= 1706050495; bh=hI6VeRENRiuY0dSnaW1U7ZnkOY82O07buo4+uXytfgo=; b=k gb9Axv2KnY7Zjh7/am2dRMMzRbeKB0MSLnKmtEEXESjM1q1lXbNORXvZUboTL4jF zUZCeZINkeqxfguXw1EEePdvvytIVzI4arTig3jLGnnLzuIlSoU0ljBlohnXGAD8 1CehEN5mly8Rj81/z/LKWk7t5Hj/mhTSUOQ/ZaLHiqLEJM6LN0XvcyXT/GGEAQmH P3iixCRjTA0znVIzmcGfnUN39I7KGmk/3FOUfO3E1qBwRv2cylcVLQKQSR0DY1a8 90Vy5PnIS6TQItJADvfTLO2I+mfpxUy7KkvzKInLLXaArxhgyX3mi7T2Az2N/XcG fqRtgMTXzBwdhV2rEEHbA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdekjedgtddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkjghfofggtgfgsehtqh ertdertdejnecuhfhrohhmpedftfhosggvrhhtucftrdcutfhushhsvghllhdfuceorhho sggvrhhtsehrrhgsrhhushhsvghllhdrtghomheqnecuggftrfgrthhtvghrnhepgeehhf ekveegtdfhgeejieejveegtdeiteejtdeuhfefleetudffgfdtfefhgfevnecuffhomhgr ihhnpehophgvnhhhuhgsrdhnvghtpdhgihhthhhusgdrtghomhdprhgvrgguthhhvgguoh gtshdrihhonecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho mheprhhosggvrhhtsehrrhgsrhhushhsvghllhdrtghomh X-ME-Proxy: Feedback-ID: ie421460a:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 22 Jan 2024 17:54:54 -0500 (EST) Date: Mon, 22 Jan 2024 16:54:52 -0600 From: "Robert R. Russell" To: freebsd-hackers@freebsd.org Subject: Re: The Case for Rust (in the base system) Message-ID: <20240122165452.13733a66@venus.private.rrbrussell.com> In-Reply-To: References: <1673801705774097@mail.yandex.ru> <202401210751.40L7pWEF011188@critter.freebsd.dk> <40bc1694-ee00-431b-866e-396e9d5c07a2@m5p.com> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ----- X-Spamd-Result: default: False [-5.50 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; DWL_DNSWL_LOW(-1.00)[messagingengine.com:dkim]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[rrbrussell.com,quarantine]; RWL_MAILSPIKE_EXCELLENT(-0.40)[64.147.123.20:from]; R_DKIM_ALLOW(-0.20)[rrbrussell.com:s=fm2,messagingengine.com:s=fm3]; R_SPF_ALLOW(-0.20)[+ip4:64.147.123.20]; RCVD_IN_DNSWL_LOW(-0.10)[64.147.123.20:from]; MIME_GOOD(-0.10)[text/plain]; DKIM_TRACE(0.00)[rrbrussell.com:+,messagingengine.com:+]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; ASN(0.00)[asn:29838, ipnet:64.147.123.0/24, country:US]; ARC_NA(0.00)[]; TO_DN_NONE(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FREEFALL_USER(0.00)[robert]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+] X-Rspamd-Queue-Id: 4TJlsr5vgjz4YYZ On Mon, 22 Jan 2024 10:13:30 +0000 David Chisnall wrote: > On 21 Jan 2024, at 16:04, Alan Somers wrote: > >=20 > > Perhaps it will. But Like David Chisnall, I'm afraid that if > > FreeBSD never modernizes, then it itself will go out of fashion by > > the 2040s. =20 >=20 > Apparently I=E2=80=99m participating in this thread already. I=E2=80=99m= getting > over a nasty cold and my head is full of cotton wool, so apologies in > advance if this is more rambling than normal: >=20 > I hope it=E2=80=99s no surprise to anyone that I am in favour of languages > that give stronger guarantees to programmers and let you think more > abut the problems. I can=E2=80=99t imagine going back to writing anything > non-trivial in a language without RAII or a rich set of generic > collections. >=20 > To give a bit of personal background: In my previous role, I was one > of the coauthors of the internal strategy document that argued for > safe languages at Microsoft. Our rough recommendation was: >=20 > - No new C code. There are *always* better options. > - C++ code should follow the Core Guidelines and use static > analysis. New C++ code is acceptable in projects that are already > C/C++ and need to incrementally improve. > - Rust in new projects that need a systems programming language. > - Managed languages anywhere where a systems language is not needed > (i.e. most places). >=20 > Between modern C++ with static analysers and Rust, there was a small > safety delta. The recommendation was primarily based on a > human-factors decision: it=E2=80=99s far easier to prevent people from > committing code that doesn=E2=80=99t compile than it is to prevent them f= rom > committing code that raises static analysis warnings. If a project > isn=E2=80=99t doing pre-merge static analysis, it=E2=80=99s basically imp= ossible. > Between using modern C++ (even just smart pointers and ranges) and C, > there is an enormous safety delta. =20 >=20 > The unstable Rust ecosystem was less of an issue for Microsoft > because they had a large compiler team and were happy to maintain > security back-ports of any critical crates. The same software supply > chain things applied for Rust as everything else: no random pulling > from Cargo, dependencies need to be cloned internally and run through > a load of compliance things. That=E2=80=99s probably the only sensible w= ay > of interacting with the Rust ecosystem. >=20 > For userspace, I=E2=80=99d love to see FreeBSD more actively support the > cap-std project in Rust, which makes it incredibly easy to write Rust > programs that play nicely with Capsicum. >=20 > It=E2=80=99s unclear to me that now is the right time to support Rust in = the > base system, because there=E2=80=99s still a lot of churn. Facebook has > effectively forked Rust because their (huge) Rust codebase doesn=E2=80=99t > build with newer compilers. If you=E2=80=99re Microsoft or Facebook, > maintaining an old Rust compiler for a few years and back-porting > things to work with that language snapshot is a cost that may be > worth paying. I don=E2=80=99t think the FreeBSD project has the resource= s to > do so. A limited set of dependencies may work. >=20 >=20 > There are a few caveats about Rust: >=20 > First, it=E2=80=99s quite hard to find competent Rust developers. Here a= re > the OpenHub stats on new F/OSS code being written in Rust, C, and C++: >=20 > https://openhub.net/languages/compare?language_name%5B%5D=3Dc&language_na= me%5B%5D=3Dcpp&language_name%5B%5D=3Drust&language_name%5B%5D=3D-1&language= _name%5B%5D=3D-1&measure=3Dloc_changed >=20 > C++ has been slowly trending up, and C down, for the last decade. > Rust is trending up a lot, but it=E2=80=99s starting from zero and there= =E2=80=99s > still a lot more C or C++ code being written than Rust. It=E2=80=99s now > easier to hire systems programmers to write C++ than C, and easier to > hire either than to hire good Rust programmers. This tradeoff may be > very different for an open source project because there are a lot of > *very* enthusiastic Rust developers and attracting a dozen or two of > them to contribute would be a huge win. People tend to be less > enthusiastic about C or C++. >=20 > Most of the new kernels written in the last 20 years have been C++, > most of the new kernels written in the last four years have been > Rust. Make of that what you will. >=20 > Neither Rust nor C++ guarantee safety. C++ can always escape to bare > pointers (it=E2=80=99s code smell, but it=E2=80=99s sometimes unavoidable= ). Rust has > unsafe and requires it for any data structure that isn=E2=80=99t a tree > (either directly or via some existing code such as the RC / ARC > traits). One of our concerns was the degree to which the different > uses of unsafe in various Rust crates compose. There was a paper a > couple of years ago that found a lot of vulnerabilities from this > composition. I don=E2=80=99t personally have a great deal of faith that > unique ownership at an object level with a load of heuristics about > when it=E2=80=99s safe to alias is the right long-term model. Verona wen= t a > very different way and I hope Rust may be able to retrofit our ideas > at some point. =20 >=20 > One project that I worked with, for example, was bitten by the fact > that unsafe in Rust means =E2=80=98I promise to follow all of the Rust ru= les, > you just can=E2=80=99t mechanically check them=E2=80=99. It read a value= from an > MMIO register into a variable typed as an enumeration. Outside of > the unsafe block, it then checked that the value was in range. Rust > enumerations are type safe and so the compiler helpfully elided this > check. Moving the check into the unsafe block fixed it, but ran > counter to the generic =E2=80=98put as little in unsafe blocks as humanly > possible=E2=80=99 advice that is given for Rust programmers. >=20 > When I looked at a couple of hobbyist kernels written in Rust, they > had trivial security vulnerabilities due to not sanitising system > call arguments. This was depressing because both Rust and C++ make > it trivial to wrap userspace pointers in a smart pointer type that > does the checks automatically. =20 >=20 > In snmalloc, for example, we use C++ templates to express the > lifecycle of memory throughout its allocation flow. This would also > be possible in Rust, but isn=E2=80=99t free in either language: you have = to > use the tools provided, but the outcome is that we can statically > check a lot of properties at compile time. >=20 > With one of my other hats, I am the maintainer of an RTOS that is > written in C++ and runs on a platform where the hardware enforces > spatial and temporal memory safety. To date, I don=E2=80=99t believe we= =E2=80=99ve > had any bugs that would have been prevented by Rust. All of the > memory-safety bugs (we have had some, and we catch them fairly easily > because they lead to traps and so are easy to add tests for) have > been in code that=E2=80=99s doing intrinsically unsafe things (memory > allocators, for example). We use C++20, with moderately heavy use of > concepts. We have a ring buffer implementation that uses a mixture > of static_asserts and templates to verify the wrapping behaviour at > compile time and that=E2=80=99s just one example of a place where we do a= lot > of compile-time checks that are impossible in C. >=20 > I=E2=80=99d also like to clear up a few misunderstandings about C++: >=20 > - The Itanium C++ ABI has been stable for 20+ years. C++ shared > libraries compiled with clang and linked against those compiled with > GCC (or vice versa), or different versions of the same compiler has > been standard practice for a long time. Both libstdc++ and libc++ > use inner namespaces for the standard-library types and so allow > something like symbol versioning but exposed at the language level. > You can see ABI breaks if one library uses a newer version of a type > and the other an older one, but that=E2=80=99s why we only bump those for= ward > on major releases: C++ DSOs compiled for FreeBSD 13 may not link with > binaries compiled for FreeBSD 14. >=20 > - Command-line argument parsing and JSON are not part of the C++ > standard library, but there are de-facto standards. Nlohmann JSON[1] > and CLI11[2] are widely used (it=E2=80=99s been a long time since I=E2=80= =99ve seen a > project that used anything else) and have very easy-to-use > interfaces. I believe (I am a member of the C++ standards committee, > but I only recently joined and have not participated in discussions > around this) that a big part of the reason it isn=E2=80=99t in the core > specification is that there is a de-facto standard and there=E2=80=99s li= ttle > urgency in adding it to the core. >=20 >=20 >=20 >=20 > Finally, one of the key things that we found was that a lot of > projects used C/C++ out of inertia. They don=E2=80=99t have peak memory = or > sub-millisecond-latency constraints and could easily be written in a > managed language, often even in an interpreted one. We have Lua in > the base system. I=E2=80=99d love to see a richer set of things exposed = to > Lua. I played a bit with a kqueue wrapper using Sol2[3] that lets > you write Lua coroutines and have them implicitly yield on blocking > operations. =20 >=20 > I=E2=80=99d love to see a generic process manager in the base system that > subsumes devd and inetd written in Lua, with C++ wrappers around > pdfork (ideally pdvfork, but it doesn=E2=80=99t exist yet) and friends, > exposed via sol2. The code in C++ is dealing directly with low-level > system interfaces and would not be safer in Rust, but all of the > parsing and control-plane logic can live in a safe GC=E2=80=99d language. > You can run a lot of Lua code in the time it takes one fork call to > execute. >=20 > If we exposed type info from dynamic sysctls generically (I think > there=E2=80=99s a project working on this?) then things like sysstat coul= d be > written in Lua. I was experimenting with Dear ImGui for this, since > it had back ends that rendered in X11, Wayland, in a terminal, or > remotely over a websocket. Unfortunately, the latter two were never > merged and are probably unmaintained (the author is also the person > behind llama.cpp and so probably isn=E2=80=99t going to work on it for a > while). Being able to run management tools in a terminal and click > on a URL to open them in the web browser would be amazing, but > doesn=E2=80=99t require a new systems programming language. >=20 > I=E2=80=99d love to see a default that anything intended to run with elev= ated > privilege is written in Lua. >=20 > David >=20 > [1] https://github.com/nlohmann/json > [2] https://github.com/CLIUtils/CLI11 > [3] https://sol2.readthedocs.io/ If you had to estimate what is the cost of enforcing better C++ code? I am not familiar with Lua and most of my experience with Lua like languages have included dynamic code injection as an attack vector. Is it feasible to protect Lua from that problem in the use case you propose?