Re: a1097094c4c5 - main - newvers: Set explicit git revision length

From: Ravi Pokala <rpokala_at_freebsd.org>
Date: Thu, 19 Dec 2024 17:37:13 UTC
It occurred to me to see what Linux distros do. I spot-checked several distros that we have in our lab (several versions of Debian, RHEL, Rocky, SLES, OpenSuSE, Ubuntu), and it looks like only SLES and OpenSuSE embed a Git hash in their 'uname -a' output. For both those distros, it uses just seven characters.

-Ravi (rpokala@)

-----Original Message-----
From: <owner-src-committers@freebsd.org <mailto:owner-src-committers@freebsd.org>> on behalf of John Baldwin <jhb@FreeBSD.org <mailto:jhb@FreeBSD.org>>
Date: Thursday, December 19, 2024 at 07:03
To: Gleb Smirnoff <glebius@freebsd.org <mailto:glebius@freebsd.org>>, Ed Maste <emaste@freebsd.org <mailto:emaste@freebsd.org>>
Cc: <src-committers@freebsd.org <mailto:src-committers@freebsd.org>>, <dev-commits-src-all@freebsd.org <mailto:dev-commits-src-all@freebsd.org>>, <dev-commits-src-main@freebsd.org <mailto:dev-commits-src-main@freebsd.org>>
Subject: Re: git: a1097094c4c5 - main - newvers: Set explicit git revision length


On 12/18/24 12:12, Gleb Smirnoff wrote:
> On Wed, Dec 18, 2024 at 10:22:24AM -0500, Ed Maste wrote:
> E> That said, it doesn't matter what Git's algorithm chooses as the short
> E> hash length; specifying --short bypasses that algorithm. `git
> E> rev-parse --verify --short=12 HEAD` will give us a 12-character short
> E> hash as long as that hash is unique. The reproducibility concern is
> E> thus: what is the probability that the 12-character short hash is
> E> unique at the time and in a repo from which an image is built, but is
> E> not unique for the attempt to reproduce it, or vice-versa. This
> E> probability is rather small.
> E>
> E> If you look at arbitrary commits 6 or 7 characters are usually
> E> sufficient for a unique hash today. For instance, some latest -pX from
> E> recent releng/ branches:
> E>
> E> 13.3: 72aa3d
> E> 13.4: 3f40d5
> E> 14.0: f10e32
> E> 14.1: 74b6c98
> E> 14.2: c8918d6
> E>
> E> The status quo of --short=12 should be fine for quite some time.
> 
> AFAIU John's concern is that you can't guarantee a reproducible build from a
> "dirty" repository. A repository that has more branches than just the official
> ones. I just make a quick check on Netflix repo, that has both the current
> FreeBSD history and the before-the-official-git history together, as well as
> splitted ports subdirectories and of course our own stuff. For short hashes
> there are roughly 2x more ambiguities than for a "clean" repo. Apparently
> chance of collision on a long hash is also doubled.
> 
> We can of course say that we don't provide reproducible builds from a "dirty"
> repo. But would be a real limitation. That would cancel a legitimate
> scenario:
> 
> git subtree add FreeBSD && cd FreeBSD && make a reproducible build


In particular, the dirty repository scenario I imagine is FreeBSD's official
repository at some point in the future. A question though is how far in the
future would it have to be to matter. If we would need 100+ years at our
current commit rate to matter, then this is probably moot. The other point
I guess is that how many other user git settings can affect the build? Should
we not require an empty global git config as a prereq for someone who wants a
reproducible build (and use the same setup for our official builds) and say
that if you adjust your user config to impact the build that's kind of your
problem?


-- 
John Baldwin