How to keep the previous foreach batch result in memory in spark streaming?

48 Views Asked by Venkatesh At 05 February 2024 at 12:32

I am receiving the events continuously from the event hub and performing the aggregation group by on 2 second window ,however the records which has 2 second window are falling into different microbatches ,how to handle this scenario in spark streaming

Input :

Following output I needed.

Here is my code

         group_by_attributes = ["id"] + (
             [window("timestamp", "3 seconds","2 seconds")] if agg_by_window else []
         )
       
        return (
            df.groupBy(*group_by_attributes)
            .agg(
                agg_max("col1").alias("col1"),
                agg_max("col2").alias("col2")
            )
        )

in my case aggregate_by_window is true

Here how to keep the previous batch data in memory and later how can i use the previous batch data in current batch or do i need to remove foreachbatch?

Original Q&A

How to keep the previous foreach batch result in memory in spark streaming?

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-STREAMING

Trending Questions

Popular # Hahtags

Popular Questions