We have a multi-tenant app. Multiple Organizations, shared database.
I am building an API where customers will send their Organization data. In regular intervals, they will send a full load (their entire data) OR what has changed on their side. The objective is to keep their Organization's structure in sync with what we have on our side.
We don't want to process this data as soon as it arrives. We want to store it, then a processor app will normalize it, remove duplicates, insert to the database whats is valid, discard and log what is invalid. Thus, we need to:
- As soon as we start receiving a new bulk of data, create some kind of storage for it. For example, it could be a new PostgreSQL table.
- Receive and insert every data in this new table.
- Wait for a signal that they finished sending all the data. Now our processor can start processing it.
- We don't want to keep this data. We could now drop this table.
What was initially proposed was to dynamically create a new table in a separate database for each bulk of data. For example, if this Organization syncs data daily at 12am, we would have a table like:
organization_sync_<organization_identifier>_2024-02-14-00-00
We would process this table and drop it.
Another option would be to store this same data inside Redis. Same principle: process then drop, with the added benefit of having a time to live.
We already have Kafka at our stack, but I couldn't find a way to process an entire Organization bulk before processing others. Did a quick search about dynamically creating topics, found out it was not a good idea.
Is there a better architecture for this problem? We need an efficient way to store this data temporarily then remove the whole bulk of data in a simple and efficient way.