How to avoid re-syncing entire tables between microservice databases upon schema changes?

Question

How to avoid re-syncing entire tables between microservice databases upon schema changes?

32 Views Asked by Philipp Doerner At 25 March 2024 at 21:01

This question is about microservices and how to deal with syncing data (particularly large amounts of data) as the database schema of your microservice changes. It is very similar to this SO question but focuses on the data-syncing aspect more.

Say for example you have a User-API and a Chat-API connected via a message broker. Your Chat-API had to be aware of some of the user related data (say for example: Username, profile image) but by far not all. So your Chat-API (since it has its own schema) listens to messages about user-creation and deletion from the Users-API and updates its own User-Table with a subset of the data.

Now a new feature arrives that suddenly requires the Chat-API to also be aware of whether the User has purchased a subscription of some sort. The User-API provides the field, but it was previously ignored. Now imagine you had 100.000+ Users, for none of which you now have that information in the Chat-API database.

The obvious solution to somehow re-send all those messages from the Users-API back to the Chat-API so it can now also grab the value of the User's subscription and store it. But depending on the size of the database, that can be a ludicrous undertaking.

Is that the only option you can have here? Or what am I missing?

Original Q&A

There are 1 best solutions below

**Christophe Quintard** · Accepted Answer · 2024-03-26T21:46:29.417000

You can solve this problem by switching from message queue to message log.

RabbitMQ is an example of message queue. A publisher emits a message, the message is copied to the queues of all subscribers, and when a subscriber consumes the message, the message is removed from its queue.

Kafka is an example of message log. A publisher emits a message, the message is written into a topic. All the subscribers point to this topic, and the topic keeps track of the offset of each subscriber. When a subscriber consumes a message, its offset is increased. You can configure Kafka to never delete the messages.

Your problem is that you use a message queue, so when you want a subscriber to consume again all the messages, you have to emit again all the messages. If you used a message log, you could just reset the offset of the subscriber, and the subscriber would consume again all the messages.

In order to limit the number of messages into the topic, I strongly suggest that you configure Kafka to do compaction, which means Kafka will only keep the last version of messages with the same id. This will also speed up things when a subscriber starts over from the beginning, because it will have less messages to consume.

Do not be afraid of doing this on topic with 100K+ messages. I've done it on a regular basis on topics with millions of messages.

How to avoid re-syncing entire tables between microservice databases upon schema changes?

There are 1 best solutions below

Related Questions in DATABASE

Related Questions in SYNCHRONIZATION

Related Questions in MICROSERVICES

Trending Questions

Popular # Hahtags

Popular Questions