Re: It's not Rust, it's FreeBSD (and LLVM)

From: Olivier Certner <olce_at_freebsd.org>
Date: Mon, 09 Sep 2024 13:13:03 UTC
Hello Poul-Henning,

> But a FreeBSD system recompiling itself from source is even rarer.
> (...)
> Beware of selection bias.
> "Somebody who compiles from src" is almost the literal definition of "committer".
> In terms of all the FreeBSD running hardware out there, not even
> one percent of one percent of the machines compile from src.

You're most probably right.  At the same time, I'm having a hard time finding this angle of the number of machines compiling 'src' even remotely relevant to the debate.

As part of determining LLVM's position in FreeBSD (and, in fact, more), the crucial questions seem to me to be: For which uses do we need to compile from source and what is the importance (in the widest possible sense) of these uses to the project?

To state the obvious, committers and developers have to build from source, so it must be easy for them to do so, and must *remain* so.  But there is much more.

Building customized version of FreeBSD is very useful to fulfill at least these needs:
- Usability/Performance: Stripped-down OSes for embedded platforms, and more recently VMs.
- Security: Reduce the attack surface by removing not used components.
- Appliances (for the various reasons above).
- Experiments: Experimental options, drivers, etc.

Being able to re-build releases in a controlled environment (with their associated specific version of the compiler) is useful for maintenance (bug fixing, debugging) and security (build reproducibility).

And compiling from source is likely also very valuable for attracting new developers.  From an educational standpoint, it may foster adoption in the cursus of more universities, raising the exposure of students to FreeBSD and our chances to have them as developers later on.

These are already prominent reasons to make sure that building from source is not only feasible, but actually *easy* to do.  And I'm sure this list is not even complete.

> LLVM does not belong in src by any sane criteria, and any microscopic
> benefits of "tight integration" can be delivered with a "toolchain-llvm"
> (meta-)port.

The benefits of "tight integration" are by no means "microscopic" (easy to setup environment, reproducible builds, compiler bugs workaround, etc.).  I wonder what makes you think otherwise.

However, I agree that tight integration per se doesn't necessarily require having LLVM/clang's code living in 'src'.  It could be achieved through something like a port, although for now it's unclear to me if exactly a port as we have today would be feasible or desirable (more on this below).

The external toolchain infrastructure is an already existing step in this direction.  If we chose to build on it, I think what would be missing is:
- The way to record a dependency to a precise version of the compiler(s) (plus custom patches) needed to build the sources.
- The ability to easily (a command away, or even automatically as part of the build) grab these compilers in binary form.
- Same, but to grab their code (and apply patches) and build them with some already installed compilers (the result may be used directly to build 'src' or as a boostrap).
But see more below.

> We need to find a contemporary and useful answer to "What is FreeBSD?"

I think you've answered part of that satisfactorily in your initial mail already:

> Delivering a single consistent userland with the kernel has stood
> us well for three decades, and we should stick with that.

I'll add:
- A system that is easy to build and tweak in practice (for developers at the very least).
- A system that is self-sufficient, in the sense that, once installed on a machine, one can set it up to do whatever one would like without having to boot something else (e.g., further software installations must be possible over the network or from some USB key; so it must include some tools to support the network, etc.).
- A system with looked-after architecture, quality and ease of use.  I know you said "consistent" but I expressly want to state what I consider the most important aspects of that notion.

From these statements we can draw a lot of implications, assuming people agree with this list.  Below, I'll limit myself to those in connection with the need to easily build from source and externalizing components out of base (for all software, not LLVM specifically).

Having a port for the 'src'-required compiler(s) or other components of base certainly has advantages (the machinery to fetch code, apply patches and build already exists), but in this light also has drawbacks.

First, it becomes less straightforward to work (inspect, modify, etc.) with the code to be built.  If following the traditional port model, the pristine code would be fetched from upstream, and our modifications would be made available through ports tree patches.  Producing the final source then requires applying the patches (the equivalent of "make patch" in ports) for each "remote" component one wants to inspect.  If, instead, the project hosts already tuned source code in the form of tar balls, then there is the question of how to fetch them and unpack all files in a tree that again should preferably make sense to developers.

Second, both approaches easily lose the ability to inspect the history of each remote component if not done properly.  At worst, it is always possible to ship a '.git' directory, or equivalent, but recording somewhere the commit identifier and having the infrastructure to either clone the repo at the proper commit, or switch an existing checkout to it, would be much better.  However, these approaches miss one important point: We don't necessarily need all upstream's histories, although it would help, but instead probably require being able to quickly check out which code was shipped with which release, or is part of some -STABLE or -CURRENT, now or at some earlier point in time, and upstream repositories obviously can't answer these questions on their own.

So I think that, if we actually massively switch to external components, be they LLVM or not, we are going to need a fair amount of tooling to keep a smooth developer/security auditer/you-name-it experience with FreeBSD (those that require the source code and navigating through it).  It may be that this is not deemed necessary for externalizing LLVM alone (I'm not a LLVM integrator, nor developer), but I doubt we can smoothly generalize the use of external components without it.

Thanks and regards.

-- 
Olivier Certner