Consistently replicate mysql state using debezium

265 Views Asked by At

I have multiple DB tables and have setup Debezium connector on the top of it. I want to create denormalized view of original table while keeping consistency of entities (let's say pushing data to elastic search for search).

What do i mean by consistency?

Few rows of 1 table or multiple tables are creating a logical entity. I want to make sure this entities validation assumption are never broken. It is ok if i have to duplicate the logic of this entity in multiple services. By consistency i don't mean that all the entity's order should be preserved while inserting to another table (linearizability or causality across the table). for example

  1. There might be db constrain spanning across multiple tables in the original db like foreign key, unique keys etc.
  2. logical constrain like if number of product is >1, there will be atleast 1 row in product table.

Why?

If the denormalised view is not consistent, then it becomes very difficult to code for scenarios and make any sort of assumption about data.

If we write to mysql in a single transaction it will make sure who so ever is reading will get consistent view. But from what i have read Debezium will write different table changes in different topic. The speed of processing from both the topics might be different. if we process update from both the topic independently it can break the assumptions about the entity.

I have listed down few approach that i could take

  1. On each event, query original data source

    In this approach, we read from kafka topic only to get the key that changed, and then call the mysql to get a consistent view.

    Pros: consistent view of data store

    Cons: the load on mysql db will increase,

  2. Use Debezium transaction

    Debezium gives an ability where it will write all the transactions to a separate topic. We can use this topic to do stream join between other kafka topics. But this will limit throughput. since we will not be reading further events form topic till we get all the events of a transaction.

    Pros: No additional requests on source db

    Cons: reduced throughput, complicated kafka stream join as we don't know what all entities are getting impacted in the transaction.

  3. Writing aggregate in a separate table (outbox pattern)

    We can write the aggregated entity in a separate table in the same transaction and only listen on this tables change.

    Pros: No complicated joins. No additional request on source db

    Cons: more work to be done in the write path.

Are there other approaches i am missing? Can we combine few of the approaches to make it an overall better approach (like making sure all the related records go to same kafka paritions so that we don't have to perform complicate joins)? if debezium reads bin log similar to what mysql slaves do, why can it not provide a linearizable view of what has changed? What does mysql slave take care of foreign key constrains while doing replication?

I am new to Debezium and don't know all the features. Hence this question.

0

There are 0 best solutions below