Will converting hash to integer using modulus result in an even (uniform) distribution of integers?

62 Views Asked by At

Say I have this code to convert a hash string into an int:

const crypto = require('crypto');

function uuidToInteger(uuid) {
    // - Create a hash object using SHA-1 algorithm
    const hash = crypto.createHash('sha1');

    // - Update the hash object with the UUID
    hash.update(uuid, 'utf-8');

    // - Get the digest (hash value) as a hexadecimal string
    const hashValueHex = hash.digest('hex');

    // - Convert the hexadecimal string to an integer
    const hashValueInt = parseInt(hashValueHex, 16);

    const mappedValue = 1 + (hashValueInt % 20); 

    return mappedValue;
}

Example usage

const uuid = '550e8400-e29b-41d4-a716-446655440000';
const result = uuidToInteger(uuid);
console.log(result);`

will (in theory) this result of even/uniform distribution of integers 1-20? as n (as the number hashes converted gets big)? If yes why, if no, why?

Ultimately I am trying to use the hash to route requests via a load balancer, having 20 child services that the LB can route to using an int (in memory mapping as opposite to some more complex mapping that would require redis etc)

1

There are 1 best solutions below

0
Simon Goater On

Suppose you have a random number generator function RAND() which returns a whole number in the interval [0,RAND_MAX] but you want to use it to generate a random whole number in the interval [0,MAX]. Then one simple way to do this is with the modulus operator

return RAND() % (MAX+1);

Assuming the the probability of every result in [0, RAND_MAX] occurring from a call to RAND() is equal, then the probability of every result in [0,MAX] appearing is equal if MAX+1 divides RAND_MAX+1, and not equal otherwise.

Take the example MAX = 5, RAND_MAX = 16. Then are 3 occurrences of 00 to 04 and only 2 of 05. Therefore, 05 has only 2/3rds of the probability of occurring compared to the rest.

RAND()           = 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16
RAND() % (MAX+1) = 00 01 02 03 04 05 00 01 02 03 04 05 00 01 02 03 04  

More generally,

Let S = (RAND_MAX+1) % (MAX+1)
Let Q = floor((RAND_MAX+1) / (MAX+1))

then (RAND_MAX+1) = Q(MAX+1) + S.

Probability of [0, S-1] occurring = (Q+1)/(RAND_MAX+1)
Probability of [S, MAX] occurring = Q/(RAND_MAX+1)

Check it all adds up...

((MAX-S+1)Q + S(Q+1))/(RAND_MAX+1) = (Q(MAX+1) + S)/(RAND_MAX+1) = 1

This shows that if S > 0, the resulting probabilities are not all equal. However, if Q is large, then the difference in probabilities may be acceptably small.

Provided Q > 0, one way to produce random results in the range [0, MAX] all with equal probability from RAND() using the modulus operator would be to reject the highest S values returned from RAND(). For example, assuming S and Q are already known,

if (S == 0) return RAND() % (MAX+1);
ret = RAND_MAX;
while (ret >= Q(MAX+1)) ret = RAND();
return ret % (MAX+1);

Since less than half of the results from RAND() are rejected, the probability of it finding a result it can use is more than a half on each loop of the iteration, so there is no realistic possibility that it can loop indefinitely. In practice, it will not have to loop much if at all to find a usable result.

In the case of the OP's question, Q = floor(2^160 / 20) is huge, so the probability differences described above are only of cryptographic significance.