Event sourcing and domain events with multiple ids

133 Views Asked by At

Event-sourcing often implies to have one row per aggregate id :

event_id event_type entity_type entity_id event_data
102 OrderCreated Order 101 {...}
103 OrderUpdated Order 101 {...}

But what if you have some batch action that results to a bunch of events such as :

  • "Mark 10 000 e-mails as read" => 10 000 EmailReadEvent
  • "Update Device status of 10000 devices" => 10 000 DeviceStatusUpdatedEvent

For such scenarios (with replicas in other microservices):

  • Should we store those 10 000 events in the event store ?
  • Should we publish 10 000 events to the broker, and consume one by one each event on the subscriber side ?

This seems to be a waste of resources, and very unefficient. Unfortunately, I cannot find any resource on the web about this specific topic. My idea for now is to design my domain events to all have a list of ids, in case of batch action. EmailReadEvent would have a list of email ids, and DeviceStatusUpdatedEvent would have a list of device ids. Same thing for the event store, I would have a list of ids in the entity_id column.

What do you think about this approach ? Is there any better way to do this ?

3

There are 3 best solutions below

1
Levi Ramsey On

The limitation of one aggregate ID per event doesn't really come from event sourcing: it comes from the fact that these are events for an aggregate, which implies that every event concerning a given email/device/etc. was emitted with the knowledge of all previous events concerning that email/device/etc.

You can event source without having aggregates (just like you can have aggregates without event sourcing).

It's also possible to have events that end up associated with multiple devices/emails and have the consistency benefits of an aggregate: all of those emails/devices can "just be" entities in the same aggregate. In a lot of contexts, this is actually viable (e.g. an email inbox or all the emails received in a day in a given inbox could well be a cromulent aggregate; devices that needed a status update at the same time might well belong to the same customer or be deployed to the same site).

As a couple of side notes...

replicas in other microservices

This is, to my mind, a potentially strong sign that something has gone wrong, depending on what you mean by "replica" and "other microservice". I would be wary of the possible implication that you're combining ideas that want to be in different bounded contexts, which would mean that you could be making a lot of things more complex than they need to be. Alternatively, this could be a consequence of technical constraints forcing more microservices than would be needed to manage complexity.

consume one by one each event on the subscriber side

There's not necessarily a reason any particular subscriber has to consume events one-by-one (some technical choices could force this on you: you can likely revisit those choices).

1
Ramin On

Not required to store all update in database but it's very depend on your situation or the problem you faced . If it's really necessary to do it so you should to do this but if you think it's just wasting your resources , so it's better to ignore or store, log data in some part than affordable. It's just trade off your solution and it does't have a fixed answer.

0
R.Abbasi On

You have to think about the meaning of events in your system. What do they mean? Why do you use event sourcing? Why do you need to use batch operations? Usually, using bulk operations is a sign of bad design.

Let's assume it's required to use bulk operations. If you use event-sourcing and CQRS, you shouldn't worry about your write model. For bulk-creating events, you can group the bulk operation by an aggregate (all changing entities belong to an aggregate root, for example, all order items of an order) or by a saga (using correlation ID to group all the AR instances or another aggregate to control the flow, for example using DevicesChangeRequestAggreegate to express the bulk flow) to ensure that all the events are persisted transactionally.

The concern would be the read model. The read model should process each event and update its model. If you design your events as bulk events, the burden of events hierarchy complexity (like versioning for bulk and single operation events) is not worth the performance gain. You can let the read model be inconsistent for a while as CQRS results in eventual consistent models. No harm in doing that.

Note: Simplicity is much better than performance. Do not do premature optimization. Don't use complex design unless your system suffers from poor performance and there are complaints about it. Even if the UI forces bulk operations, do it sequentially until the performance gets poor.