We need to do something about build times

From: Robert Clausecker <fuz_at_freebsd.org>
Date: Tue, 24 Oct 2023 19:12:13 UTC
The build times have gone up to the point where they are unsustainable.
Frequent updates to key ports (like llvm*, rust, gcc*) make it so that
basically every time I prepare a new batch of commits, I have to rebuild
a variety of toolchain ports across 8 jails (amd64/i386/arm64/armv7 each
for FreeBSD 12.4 and 13.2).  This takes multiple days.  And I'm working
with hardware that's quite recent (for x86, an 8 thread Skylake box, for
arm, an 8 thread Windows 2023 dev kit).

By the time the builds are done, some random update has usually caused
the ports to be out of date again, so if I were to rebase, I would have
to do all of this again.  And again.  And again.

Particularly bad offenders are gcc and rust.  Ccache is ineffective for
these as gcc has LTO turned on, which seems to more than triple the
regular build time to more than 24 hours even on a fast Skylake box.
This is single threaded as I build multiple ports at once; if I were to
build multi-threaded, the same amount of total CPU hours would have been
spent, so that would not fix my problem.  Ccache is also ineffective for
rust of course.

There's another issue in that ccache doesn't scale to large cache sizes
(my experiments show that anything larger than 20 GB seems to cause
problems as ccache repeatedly tries to scan the whole thing for evictions),
and the sizes that work are just not enough to be effective.  What would
help is being able to have one cache for each combination of ports tree
and jail, but Poudriere has no support for that.

Another bad offender is texlive.  For some reason, texlive-texmf needs to
be rebuilt frequently, despite mostly comprising data that is just
unpacked and repacked.  This takes forever and pegs the disk at 100% for
more than an hour as the texlive source tarball is repeatedly extracted
and then compressed into packages.  I don't get why the texlive stuff is
not split in such a way that the stuff that is just repacked lives in its
own port with no dependencies so it only needs to be rebuilt on rare
texlive updates.

And it seems I'm slowly killing my build SSD like that.  After just about
9 months, it is already at 100 TB of writes just from port builds.
Building with workdirs in memory is no longer an option as that frequently
kills my build server by filling all its RAM with build files until no
processes can be started anymore.  Poudriere does not have an effective
mechanism to prevent this (tmpfs limits don't work as the ports in
question require very large workdirs, tend to take very long to build and
tend to be built all at the same time for multiple jails).

Using prebuilt packages is not an option as they lag behind by several
days/weeks and lead to an inconsistent testing environment.  It is also
not a good solution to chose non-default build options for these ports
as it is not clear if that would affect the validity of the testbuilds.

How can we fix this problem and make ports development sustainable again?

Some ideas:

 - disable LTO and other options by default that increase build times by
   such a ridiculous degree.  This would really make a huge impact with
   very little work.  I don't think LTO on toolchain ports improves build
   times enough in comparison to the extra time it takes to build these.

 - for gcc, switch to single or no bootstrap by default.  We have known
   good toolchains we use to build gcc.  There's really no reason to
   build it multiple times just out of paranoia.  The maintainer is
   supposed to check that gcc is built correctly without bootstrapping
   so consumers don't need to build it multiple times.

 - untangle some of the dependencies so that less ports may trigger
   rebuilds of critical ports.  For example, llvm docs could be moved to
   separate ports so that updates in the documentation toolchain do not
   trigger an LLVM rebuild.

 - reduce USES to chose lighter dependencies by default.  E.g. USES=llvm
   could depend on the light flavour by default.  I'm sure only very few
   ports need all of LLVM and the light flavour is faster to build.

 - rework Poudriere's rebuild detection to not rebuild every port for
   every random bullshit thing.  For example, I don't see why ports need
   to be rebuilt for transitive changes in build dependencies.  E.g. if
   port A has build depends on port B which build depends on port C, and
   C is updated, then A has to be rebuilt despite its direct dependencies
   being unchanged.  This does not appear to be reasonable.

 - unbundle libraries more thoroughly.  We currently have dozens of
   copies of LLVM, skia, webkit, and others in tree as ports just bundle
   them instead of even making an attempt at unbundling.  This means that
   every time they need to be patched, it's a whackamole at finding all
   copies.  Plus build times suffer a lot.  I know it's hard, but perhaps
   something can be done.  For example, I have given up on trying to make
   electron work on armv7 as with every major version update, my patches
   are randomly being dropped and I have to do it all again.  Like all
   chromium ports, electron takes over two days to build on my arm box
   and my time is insufficient for that.

 - stop bulk bumping RUN_DEPENDS consumers when dependencies are updated,
   or at least think carefully before doing so.  RUN_DEPENDS are only
   installed after the build and should not affect the build.  For
   example, sysutils/cdrtools uses the command line opus encoder and thus
   depends on audio/opus.  There is absolutely no reason to bump it when
   audio/opus is updated.  It just causes everybody to needlessly rebuild
   and reinstall ports.  Sure there's the odd case where that needs to be
   done, but it seems like some maintainers just always do that, even
   when it's not needed.

 - maybe add a system where ports can declare the oldest version of
   themselves they are compatible to, in the sense that consumers only
   need to be rebuilt if they were built against a version older than
   that.  For example, if a shared library is updated with a bug fix
   that does not change the ABI, there's no need to rebuild all consumers.

With great frustration,
Robert Clausecker

-- 
()  ascii ribbon campaign - for an 8-bit clean world 
/\  - against html email  - against proprietary attachments