Re: git: a1097094c4c5 - main - newvers: Set explicit git revision length

From: Gleb Smirnoff <glebius_at_freebsd.org>
Date: Wed, 18 Dec 2024 17:12:36 UTC
On Wed, Dec 18, 2024 at 10:22:24AM -0500, Ed Maste wrote:
E> That said, it doesn't matter what Git's algorithm chooses as the short
E> hash length; specifying --short bypasses that algorithm. `git
E> rev-parse --verify --short=12 HEAD` will give us a 12-character short
E> hash as long as that hash is unique. The reproducibility concern is
E> thus: what is the probability that the 12-character short hash is
E> unique at the time and in a repo from which an image is built, but is
E> not unique for the attempt to reproduce it, or vice-versa. This
E> probability is rather small.
E> 
E> If you look at arbitrary commits 6 or 7 characters are usually
E> sufficient for a unique hash today. For instance, some latest -pX from
E> recent releng/ branches:
E> 
E> 13.3: 72aa3d
E> 13.4: 3f40d5
E> 14.0: f10e32
E> 14.1: 74b6c98
E> 14.2: c8918d6
E> 
E> The status quo of --short=12 should be fine for quite some time.

AFAIU John's concern is that you can't guarantee a reproducible build from a
"dirty" repository.  A repository that has more branches than just the official
ones.  I just make a quick check on Netflix repo, that has both the current
FreeBSD history and the before-the-official-git history together, as well as
splitted ports subdirectories and of course our own stuff.  For short hashes
there are roughly 2x more ambiguities than for a "clean" repo.  Apparently
chance of collision on a long hash is also doubled.

We can of course say that we don't provide reproducible builds from a "dirty"
repo.  But would be a real limitation.  That would cancel a legitimate
scenario:

git subtree add FreeBSD && cd FreeBSD && make a reproducible build

-- 
Gleb Smirnoff