How to combine sharding and consistent hashing within a distributed system?

386 Views Asked by lzx071021 At 08 March 2023 at 16:05

Sharding, or say partitioning, is a technique widely used in distributed systems which logically splits data into partitions. Each node is assigned a set of partitions and hence the read/write throughput could be increased with parallelization.

Consistent hashing is a technique widely used in load balancing and routing service. To find the node responsible for serving a given key, the traditional way is to compute the hash of the key and then modulo the hash value with the number of nodes N. However, if the number of nodes changes, the keys have to rehashed and to be transferred between nodes. To minimize the amount of data being transferred after rehashing, the consistent hashing is used to replace the modulo N operations so that only a fraction of data needs to be transferred.

In order to do sharding, we need two mappings:

key_to_shard: the key range is segmented into a sequence of sub-ranges and each sub-range is assigned to a shard. The assignment of key ranges to shards definitely does not involve a modulo N operation.

shard_to_node: for a given shard, it's assigned to a node. For static sharding, i.e. the number of shards never changes, key_to_shard is trivial. For dynamic sharding, there're shard splitting which splits a shard into two shards with adjacent key ranges, and shard coalescing which merges two shards with adjacent key ranges into a single shard. Neither the static sharding nor the dynamic sharding would incur a modulo N operation.

In many literatures or textbooks, there says that sharding is generally combined with consistent hashing.

So, if there's no modulo N operation, where shall the consistent hashing be used?

Is my understanding wrong?

Original Q&A

There are 1 best solutions below

AndrewR On 08 March 2023 at 19:35

In practice, usually there are many more shards than nodes. Technically N of shard will never change, as it would mean rehashing for every key.

For example, let's have 1000 shards and 10 nodes. With this approach we still can balance hot ranges, by assigning them to different nodes. In addition, we can add and remove nodes by reassigning shards.

How to combine sharding and consistent hashing within a distributed system?

There are 1 best solutions below

Related Questions in DATABASE

Related Questions in SHARDING

Related Questions in DISTRIBUTED-SYSTEM

Related Questions in DATA-PARTITIONING

Related Questions in CONSISTENT-HASHING

Trending Questions

Popular # Hahtags

Popular Questions