Can Rendezvous hashing add a node efficiently?

340 Views Asked by At

The wikipedia article for Rendezvous hashing (https://en.wikipedia.org/wiki/Rendezvous_hashing) doesn't explain what happens when you add a node to the hash table. The way I understand it, if you add a node to a hash table implemented via Rendezvous hashing, there may be objects in other nodes that should actually map to this new node since it's hash values for those objects are higher than the ones for the nodes those objects are currently in. In order to fix this problem, you would need to scan the entire hash table, recompute the hash values, and move objects if needed. This is extremely costly performance wise.

The only way I see rendezvous hashing making any sense is if the hashtable acts as a cache and is backed by a database. Then if a node doesn't have an object, it can be fetched from the database. Also, if a node has an object but the key for that object no longer maps to that node, the node's cache algorithm will evict it (through LRU/LFU).

Am I understanding this correctly? Is there a way to fix this problem?

2

There are 2 best solutions below

0
AndrewR On BEST ANSWER

Great question! The wikipedia article actually touches that topic "If an object already in the system at ... it will be fetched afresh and cached".

Basically, the proposed area for this algorithm is cases where you can cache a value, but it is ok to recache it later. While it does require extra processing, the implementation itself is dead simple. A real world example for this approach would be memcached - it uses this exact approach and does not care if you add/remove nodes - no rehashing is happening for existing keys.

Another interesting note is about relation of Rendezvous and Consistent Hashing - Consistent hashing aims to move only some keys, the ones which map into the new partition. In Rendezvous hashing case, the same number of keys will be moved; even better that every existing node on average will give away same percentage of keys - but this comes at a cost of all keys have to be reprocessed.

0
Diego Sogari On

In order to fix this problem, you would need to scan the entire hash table, recompute the hash values, and move objects if needed. This is extremely costly performance wise.

Not necessarily. You can accomplish this in a lazy way: by maintaining a traditional hash table in addition to the rendezvous, you know if a key has been inserted in the system.

In that case, if an object is determined to be on the new node but isn’t there, then it must be in one of the other nodes, and you can look for it in decreasing order of hash scores. Once located, the object can be moved to the new node.