How to use consistent hashing across publishers, queues, and consumers

19 Views Asked by At

I don't have a lot of experience with distributed systems and want to check some assumptions about how to shard data to distribute workload.

I have two systems. One listens to messages in chat rooms on twitch.tv and then publishes those messages to a queue for a second system to consume and process. I believe I understand how to use consistent hashing when distributing work to the publishers. In the following diagram, instance 1 listens to the chat rooms for broadcasters A, B, and C; instance 2 listens to D, E, and F; etc. So far, so good.

Because there is a separate hash ring for the message queues, messages from different broadcasters will be split up totally differently relative to how they were split between the publisher services. Some publishers might only need to publish to 1 or 2 queues and some might publish to all queues.

Lastly, just like consistent hashing for the message queues is totally different than the publishers, the distribution of broadcasters to listen to is totally different in consumers with some consumers only listening to a few queues and some listening to all.

enter image description here

The above is my understanding of how most, similar systems work. Why don't we try to keep things a bit more segmented like I depict below?

enter image description here

This will reduce the number of messages that consumers need to ignore. For example, in the first diagram, consumer #1 ignores all G messages from queue 1, and consumer #2 ignores all A and D messages on queue 1, but in the second diagram, neither consumer needs to ignore any messages. There are a couple implications in this proposal that I would appreciate comments on.

  1. It's relatively expensive for a consumer to ignore a message. How accurate is this assumption? If a consumer listens to a queue where it ignores the 80% of the messages, is this not as big a deal as I think it is?
  2. The bookkeeping is possible/not a total nightmare.

If my understanding that the first diagram relatively accurately depicts common designs in the industry is wrong, please let me know.

As I type this out, I'm wondering if the type of optimization in diagram #2 is not desirable in a distributed system since distributed systems are already complex enough without a bunch of added bookkeeping. Is the thought that a less optimized but simpler approach is worth it since throwing more machines at the problem is relatively cheap assuming the system somewhat performant?

0

There are 0 best solutions below