I've recently read up common Big Data architectures (Lambda and Kappa) and I'm trying to put it into practice in the context of an IoT Application.
As of right now, events are produced, ingested into a database, queried and provided as a REST-API (Backend) for a (React) Frontend. However, this architecture is not event driven as the front end isn't notified or updated if there are new events. I use frequent HTTP-Requests to "simulate" a real time application.
Now at first glance, the Kappa Architecture seems like the perfect fit for my needs, but I'm having trouble finding a technology that lets me write dynamic aggregation queries and serve them to a frontend.
As I understand, Frameworks like Apache Flink (or Spark Structured Streaming) are a great way to write such queries and apply them to the datastream, but they are static and can't be changed.
I'd like to find a way, how to filter, group, and aggregate events from a stream and provide them to a frontend using WebSockets or SSE. As of right now, the aggregates don't need to be persisted as they are strictly for visualization (this will probably change in the future).
I implemented a Kafka Broker into my application and all events are ingested into a topic and ready for consumption.
- Before I implemented Kafka I tried to apply Aggregation Pipelines on my MongoDB Change Feed, which isn't fully supported and therefore doesn't fit my needs.
- I tried using Apache Druid, but it seems as if it only supports a request/response-pattern and can't stream query results for consumption
- I've looked into Apache Flink, but it seems as if you can only define static queries that are then committed to the Flink Cluster. It seems as if Interactive/Ad-hoc queries are not possible which is really sad, as it looked very promising otherwise.
- I think I've found a way that could maybe work using Kafka + Kafka Streams, but I'm not really satisfied with it and this is why I'm writing this post.
My problem boils down to 2 questions:
- How can I properly create interactive queries (filter, group (windowing), aggregate) and receive a continuous stream of results?
- How can I serve this result stream to a frontend for visualization and therefore create an truly event-driven API?
I'd like to only rely on open-source/free software (Apache etc.).