Is 10 hex digit git hash abbreviation enough?

554 Views Asked by At

How many possible hash values does one need to avoid clashes among N items? If you recall birthday paradox, the answer is much smaller than N.

Let's reverse the question: for N=16^10 possible hash values, which corresponds to 10 hex digits of abbreviated git revision codes, with how many revision the probability of a revision hash coincidence rises to 50%? A direct calculation shows that if you have 1234603 revisions the probability that two of them would have the same 10-digit hash is 50%.

Now, a million or so revisions is not unheard of in large active repositories. Have anybody here experienced a git hash clash in your work? Theoretically speaking, that ought to have happened.

2

There are 2 best solutions below

0
bk2204 On BEST ANSWER

Git automatically scales the length of abbreviated hashes as the number of objects increases such that this is usually not an issue. In addition, if an abbreviated hash would be ambiguous at the normal length, Git will automatically produce a longer, unambiguous value. Some commands let you control the length of abbreviations with an option named --abbrev if you want a specific value, and the core.abbrev option can override the default.

However, these names are necessarily only unique at the moment they're created, so if you're producing tools that need to work with revisions, they should always operate on the full object IDs. Note also that there is work underway to switch to using SHA-256, so you should not assume anything about the length of a particular full object ID when writing tools.

0
VonC On

As explained in "How much of a Git SHA is generally considered necessary to uniquely identify a change in a given codebase?", you can get the minimum required length with git rev-parse --short

 git rev-parse --short=4

But if you want to be sure, and work only with the full lenght:

With Git 2.31 (Q1 2021), the configuration variable 'core.abbrev' can be set to 'no' to force no abbreviation regardless of the hash algorithm.

And that will be important when Git will switch from SHA1 to SHA2.

See commit a9ecaa0 (01 Sep 2020) by Eric Wong (ele828).
(Merged by Junio C Hamano -- gitster -- in commit 6dbbae1, 15 Jan 2021)

core.abbrev=no: disables abbreviations

Signed-off-by: Eric Wong

This allows users to write hash-agnostic scripts and configs by disabling abbreviations.

Using "-c core.abbrev=40" will be insufficient with SHA-256, and "-c core.abbrev=64" won't work with SHA-1 repos today.

[jc: tweaked implementation, added doc and a test]

git config now includes in its man page:

If set to "no", no abbreviation is made and the object names are shown in their full length.