Re: git: a1097094c4c5 - main - newvers: Set explicit git revision length
Date: Wed, 18 Dec 2024 15:22:24 UTC
On Mon, 16 Dec 2024 at 18:16, John Baldwin <jhb@freebsd.org> wrote: > > Well, the default --short length is not based on when Git detects a conflict, > it's a function of the total number of objects in a repository. This means > it may be different if you just fetch some other remote with many revisions > in the same clone for example. The thing I don't know is what formula git > uses and how close we are/aren't to rolling over to 13 just with src.git > alone. It seems to me that the most fool-proof thing though if we really > want reproduciblility is to drop --short entirely. Short of that, if > you can tell git to run in a mode where it ignores user configuration (though > I don't see a way to do that). I believe the algorithm can be found in repo_find_unique_abbrev_r: unsigned long count = repo_approximate_object_count(r); /* * Add one because the MSB only tells us the highest bit set, * not including the value of all the _other_ bits (so "15" * is only one off of 2^4, but the MSB is the 3rd bit. */ len = msb(count) + 1; /* * We now know we have on the order of 2^len objects, which * expects a collision at 2^(len/2). But we also care about hex * chars, not bits, and there are 4 bits per hex. So all * together we need to divide by 2 and round up. */ len = DIV_ROUND_UP(len, 2); /* * For very small repos, we stick with our regular fallback. */ if (len < FALLBACK_DEFAULT_ABBREV) len = FALLBACK_DEFAULT_ABBREV; Regardless of the algorithm, 12 is in fact the minimum to avoid short conflicts in our tree now. Both 296adaa5766 and 13c64df775c are conflicting 11-character short hashes. Certainly putting the full hash into uname would guarantee reproducibility, although IMO it makes uname unwieldy. That said, it doesn't matter what Git's algorithm chooses as the short hash length; specifying --short bypasses that algorithm. `git rev-parse --verify --short=12 HEAD` will give us a 12-character short hash as long as that hash is unique. The reproducibility concern is thus: what is the probability that the 12-character short hash is unique at the time and in a repo from which an image is built, but is not unique for the attempt to reproduce it, or vice-versa. This probability is rather small. If you look at arbitrary commits 6 or 7 characters are usually sufficient for a unique hash today. For instance, some latest -pX from recent releng/ branches: 13.3: 72aa3d 13.4: 3f40d5 14.0: f10e32 14.1: 74b6c98 14.2: c8918d6 The status quo of --short=12 should be fine for quite some time.