Re: Question regarding crunchgen(1) binaries

From: Julian H. Stacey <jhs_at_berklix.com>
Date: Wed, 17 Apr 2024 11:53:25 UTC
Hi, Reference:
> From:		"Poul-Henning Kamp" <phk@phk.freebsd.dk>
> Date:		Mon, 15 Apr 2024 19:55:22 +0000

"Poul-Henning Kamp" wrote:
> --------
> Warner Losh writes:
>
> > Maybe start there to understand what "LTO" the security thing is doing and
> > why it's either wrong or violates an assumption in crunchgen that can be
> > fixed.
>
> Crunch binaries were invented 30 years ago, to make FreeBSD 
> installation program fit on a single floppy disk.
>
> Note that the goal was saving disk-space rather than RAM.
>
> The "architecture" of crunchgen is to take a lot of programs, rename
> their main() and link them all together with a new main() which
> dispatches to the right program's main() based on argv[0]
>
> Statistically you save half a disk-allocation unit for each program
> which was nothing to sneeze at, but the real disk-space dividend
> comes from linking the resulting combi-program static.
>
> Because it is linked static, only those .o files which are referenced
> gets pulled in from the libraries, libm::j0.o only gets pulled in
> if you Bessel functions, which, countrary to rumours, sysinstall
> did not.
>
> (The goal of shared libraries is saving RAM:  Everybody gets the
> complete library, but only one copy of it's code ever gets loaded.)
>
> But the real trick is actually not crunchgen, which was originally just
> a shell script, but rather crunchide(1).
>
> Crunchide(1) does unnatural acts to an objectfile's symboltabel,
> to get around the fact that all the programs have a function called
> "main" and that they litter the global symbol namespace with their
> private inter-file references.
>
> To make a crunched binary, the .o files for the individual programs
> are first "pre-linked" without libraries so that internal interfile
> references are resolved.
>
> Then crunchide changes all global symbols, except "main" to be local
> symbols, so that they become unavailable for symbol resolution in
> the final run of the linker.  The "main" symbol is also renamed
> to a per-program name, something like "cp_main" for cp(1) etc.
>
> And then all the prelinked .o files, one per program, gets linked
> together with the "dispatch main" and this time with libraries.
>
> I see no reason why crunchgen cannot be done with Link Time
> Optimization, but somebody has to write the new crunchide(1), and
> I suspect it will have a tougher row to hoe, because pre-linking
> cannot be used to take care of the inter-program symbols.
>
> As I understand it LTO can also link with "normal libraries"
> so one option might be to only LTO the final linking step of
> the crunch process, treating all the programs as "normal libraries",
> but still getting LTO advantage internally in the libraries.
>
> Poul-Henning

Interesting, Nice if some of that were added to man crunchide.

Cheers,
-- 
Julian Stacey.       Gmail & Googlemail Fail http://berklix.org/jhs/mail/#bad 
Brits abroad reclaim http://StolenVotes.UK  http://www.gov.uk/register-to-vote
Arm Ukraine defence.   Contraception reduces global warming & resource wars.