Hashing with sha1[:10] or MD5 for caching, is MD5 is better?

136 Views Asked by At

I tried to find good hashing function that will be fast and short

There is discussion Hash function that produces short hashes?

They recommend to use:

>>> import hashlib
>>> hash = hashlib.sha1("my message".encode("UTF-8")).hexdigest()
>>> hash
'104ab42f1193c336aa2cf08a2c946d5c6fd0fcdb'
>>> hash[:10]
'104ab42f11'

There is comparison table In this link https://www.tutorialspoint.com/difference-between-md5-and-sha1 That shows that MD5 is faster then SHA1

Questions are:

  • For caching objects (not security purposes) it seems that it's better using MD5 then SHA1, am I missing something?

  • Is there better Hashing that is Fast and Short

2

There are 2 best solutions below

1
Maarten Bodewes On BEST ANSWER

For caching objects (not security purposes) it seems that it's better using MD5 then SHA1, am I missing something?

First of all, beware that it is easy to create MD5 collisions, so people could use this as an attack vector. So you'd have to be sure that no security issues are created by this.

MD5 can certainly be faster than SHA-1 but please do not forget that this depends on the implementation. I've seen pretty bad performing MD5 implementations, and current CPU's have SHA-1 acceleration build in (Intel SHA Extensions).

Is there better Hashing that is Fast and Short

Yes, there are non-secure hashes like xxHash (such as in the xxhash library) that should significantly outperform any cryptographic hash out there. That's not surprising as they do not need full collision resistance.

0
olegarch On

This is my own hash function, which I am using for search purpose:

#define NLF(h, c) (rand[(uint8_t)(c ^ h)])
uint32_t rand[0x100] = { 256 random non-equal values };

uint32_t oleg_h(const char *key) {
  uint32_t h = 0x1F351F35;
  char c;
  while(c = *key++)
    h = ((h >> 11) | (h << (32 - 11))) + NLF(h, c);
  h ^= h >> 16;
  return h ^ (h >> 8);
}

Function computes 32-bit hash for text string. To mitigate a possible adversary attack, the function uses array rand[], initialized with random int-values. This array must be unchanged during caching system lifespan. In my code, I init this array from /dev/urandom.

If you would like, feel free to take and use my hash search subsystem from the program emcssh. It using double hashing approach.