How to keep the previous foreach batch result in memory in spark streaming?

48 Views Asked by At

I am receiving the events continuously from the event hub and performing the aggregation group by on 2 second window ,however the records which has 2 second window are falling into different microbatches ,how to handle this scenario in spark streaming

Input :

enter image description here

Following output I needed.

enter image description here

Here is my code

         group_by_attributes = ["id"] + (
             [window("timestamp", "3 seconds","2 seconds")] if agg_by_window else []
         )
       
        return (
            df.groupBy(*group_by_attributes)
            .agg(
                agg_max("col1").alias("col1"),
                agg_max("col2").alias("col2")
            )
        ) 

in my case aggregate_by_window is true

Here how to keep the previous batch data in memory and later how can i use the previous batch data in current batch or do i need to remove foreachbatch?

0

There are 0 best solutions below