Re: git: a1097094c4c5 - main - newvers: Set explicit git revision length

From: Ed Maste <emaste_at_freebsd.org>
Date: Wed, 18 Dec 2024 15:22:24 UTC
On Mon, 16 Dec 2024 at 18:16, John Baldwin <jhb@freebsd.org> wrote:
>
> Well, the default --short length is not based on when Git detects a conflict,
> it's a function of the total number of objects in a repository.  This means
> it may be different if you just fetch some other remote with many revisions
> in the same clone for example.  The thing I don't know is what formula git
> uses and how close we are/aren't to rolling over to 13 just with src.git
> alone.  It seems to me that the most fool-proof thing though if we really
> want reproduciblility is to drop --short entirely.  Short of that, if
> you can tell git to run in a mode where it ignores user configuration (though
> I don't see a way to do that).

I believe the algorithm can be found in repo_find_unique_abbrev_r:

        unsigned long count = repo_approximate_object_count(r);
        /*
         * Add one because the MSB only tells us the highest bit set,
         * not including the value of all the _other_ bits (so "15"
         * is only one off of 2^4, but the MSB is the 3rd bit.
         */
        len = msb(count) + 1;
        /*
         * We now know we have on the order of 2^len objects, which
         * expects a collision at 2^(len/2). But we also care about hex
         * chars, not bits, and there are 4 bits per hex. So all
         * together we need to divide by 2 and round up.
         */
        len = DIV_ROUND_UP(len, 2);
        /*
         * For very small repos, we stick with our regular fallback.
         */
        if (len < FALLBACK_DEFAULT_ABBREV)
            len = FALLBACK_DEFAULT_ABBREV;

Regardless of the algorithm, 12 is in fact the minimum to avoid short
conflicts in our tree now. Both 296adaa5766 and 13c64df775c are
conflicting 11-character short hashes.

Certainly putting the full hash into uname would guarantee
reproducibility, although IMO it makes uname unwieldy.

That said, it doesn't matter what Git's algorithm chooses as the short
hash length; specifying --short bypasses that algorithm. `git
rev-parse --verify --short=12 HEAD` will give us a 12-character short
hash as long as that hash is unique. The reproducibility concern is
thus: what is the probability that the 12-character short hash is
unique at the time and in a repo from which an image is built, but is
not unique for the attempt to reproduce it, or vice-versa. This
probability is rather small.

If you look at arbitrary commits 6 or 7 characters are usually
sufficient for a unique hash today. For instance, some latest -pX from
recent releng/ branches:

13.3: 72aa3d
13.4: 3f40d5
14.0: f10e32
14.1: 74b6c98
14.2: c8918d6

The status quo of --short=12 should be fine for quite some time.