Re: We need to do something about build times

From: Sysadmin Lists <sysadmin.lists_at_mailfence.com>
Date: Tue, 24 Oct 2023 19:37:18 UTC
> ----------------------------------------
> From: Robert Clausecker <fuz@freebsd.org>
> Date: Oct 24, 2023, 12:12:13 PM
> To: <ports@freebsd.org>
> Subject: We need to do something about build times
> 
> 
> The build times have gone up to the point where they are unsustainable.
> Frequent updates to key ports (like llvm*, rust, gcc*) make it so that
> basically every time I prepare a new batch of commits, I have to rebuild
> a variety of toolchain ports across 8 jails (amd64/i386/arm64/armv7 each
> for FreeBSD 12.4 and 13.2).  This takes multiple days.  And I'm working
> with hardware that's quite recent (for x86, an 8 thread Skylake box, for
> arm, an 8 thread Windows 2023 dev kit).
> 
> By the time the builds are done, some random update has usually caused
> the ports to be out of date again, so if I were to rebase, I would have
> to do all of this again.  And again.  And again.
> 
> Particularly bad offenders are gcc and rust.  Ccache is ineffective for
> these as gcc has LTO turned on, which seems to more than triple the
> regular build time to more than 24 hours even on a fast Skylake box.
> This is single threaded as I build multiple ports at once; if I were to
> build multi-threaded, the same amount of total CPU hours would have been
> spent, so that would not fix my problem.  Ccache is also ineffective for
> rust of course.
> 
> There's another issue in that ccache doesn't scale to large cache sizes
> (my experiments show that anything larger than 20 GB seems to cause
> problems as ccache repeatedly tries to scan the whole thing for evictions),
> and the sizes that work are just not enough to be effective.  What would
> help is being able to have one cache for each combination of ports tree
> and jail, but Poudriere has no support for that.
> 
> Another bad offender is texlive.  For some reason, texlive-texmf needs to
> be rebuilt frequently, despite mostly comprising data that is just
> unpacked and repacked.  This takes forever and pegs the disk at 100% for
> more than an hour as the texlive source tarball is repeatedly extracted
> and then compressed into packages.  I don't get why the texlive stuff is
> not split in such a way that the stuff that is just repacked lives in its
> own port with no dependencies so it only needs to be rebuilt on rare
> texlive updates.
> 
> And it seems I'm slowly killing my build SSD like that.  After just about
> 9 months, it is already at 100 TB of writes just from port builds.
> Building with workdirs in memory is no longer an option as that frequently
> kills my build server by filling all its RAM with build files until no
> processes can be started anymore.  Poudriere does not have an effective
> mechanism to prevent this (tmpfs limits don't work as the ports in
> question require very large workdirs, tend to take very long to build and
> tend to be built all at the same time for multiple jails).
> 
> Using prebuilt packages is not an option as they lag behind by several
> days/weeks and lead to an inconsistent testing environment.  It is also
> not a good solution to chose non-default build options for these ports
> as it is not clear if that would affect the validity of the testbuilds.
> 
> How can we fix this problem and make ports development sustainable again?
> 
> Some ideas:
> 
>  - disable LTO and other options by default that increase build times by
>    such a ridiculous degree.  This would really make a huge impact with
>    very little work.  I don't think LTO on toolchain ports improves build
>    times enough in comparison to the extra time it takes to build these.
> 
>  - for gcc, switch to single or no bootstrap by default.  We have known
>    good toolchains we use to build gcc.  There's really no reason to
>    build it multiple times just out of paranoia.  The maintainer is
>    supposed to check that gcc is built correctly without bootstrapping
>    so consumers don't need to build it multiple times.
> 
>  - untangle some of the dependencies so that less ports may trigger
>    rebuilds of critical ports.  For example, llvm docs could be moved to
>    separate ports so that updates in the documentation toolchain do not
>    trigger an LLVM rebuild.
> 
>  - reduce USES to chose lighter dependencies by default.  E.g. USES=llvm
>    could depend on the light flavour by default.  I'm sure only very few
>    ports need all of LLVM and the light flavour is faster to build.
> 
>  - rework Poudriere's rebuild detection to not rebuild every port for
>    every random bullshit thing.  For example, I don't see why ports need
>    to be rebuilt for transitive changes in build dependencies.  E.g. if
>    port A has build depends on port B which build depends on port C, and
>    C is updated, then A has to be rebuilt despite its direct dependencies
>    being unchanged.  This does not appear to be reasonable.
> 
>  - unbundle libraries more thoroughly.  We currently have dozens of
>    copies of LLVM, skia, webkit, and others in tree as ports just bundle
>    them instead of even making an attempt at unbundling.  This means that
>    every time they need to be patched, it's a whackamole at finding all
>    copies.  Plus build times suffer a lot.  I know it's hard, but perhaps
>    something can be done.  For example, I have given up on trying to make
>    electron work on armv7 as with every major version update, my patches
>    are randomly being dropped and I have to do it all again.  Like all
>    chromium ports, electron takes over two days to build on my arm box
>    and my time is insufficient for that.
> 
>  - stop bulk bumping RUN_DEPENDS consumers when dependencies are updated,
>    or at least think carefully before doing so.  RUN_DEPENDS are only
>    installed after the build and should not affect the build.  For
>    example, sysutils/cdrtools uses the command line opus encoder and thus
>    depends on audio/opus.  There is absolutely no reason to bump it when
>    audio/opus is updated.  It just causes everybody to needlessly rebuild
>    and reinstall ports.  Sure there's the odd case where that needs to be
>    done, but it seems like some maintainers just always do that, even
>    when it's not needed.
> 
>  - maybe add a system where ports can declare the oldest version of
>    themselves they are compatible to, in the sense that consumers only
>    need to be rebuilt if they were built against a version older than
>    that.  For example, if a shared library is updated with a bug fix
>    that does not change the ABI, there's no need to rebuild all consumers.
> 
> With great frustration,
> Robert Clausecker
> 
> -- 
> ()  ascii ribbon campaign - for an 8-bit clean world 
> /\  - against html email  - against proprietary attachments


Well done diagnostics reporting, and the suggestions look reasonable to me as
well.

I suspect we're all getting frustrated with ever-increasing build-times, and
now's a good time to address their underlying causes.




-- 
Sent with https://mailfence.com  
Secure and private email