How to allow a apache beam dataflow window to trigger at max once?

109 Views Asked by At

I need to guarantee that our window is triggered at max one time, because we are not allowed to have multiple entries if the same timestamp in the DB. How to set the windowing to discard all the data that comes after the trigger?

1

There are 1 best solutions below

1
Poala Astrid On

Triggers, dictate the timing for emitting aggregation results as incoming unbounded data arrives. Triggers provide a means to fine-tune the windowing strategy for your PCollection.

By utilizing the AfterWatermark.pastEndOfWindow() trigger and setting the allowed lateness to zero using withAllowedLateness(Duration.ZERO), this example guarantees that the window will trigger only when the watermark surpasses the window's end. Additionally, any late data will be disregarded, resulting in each window triggering at most once.

Another workaround is you can use the default windowing setup and default trigger. The default trigger triggers only once and any late data is disregarded. The trigger mechanism for a PCollection relies on event time. It releases the window's results when the Beam's watermark surpasses the window's end, and subsequently triggers whenever late data is received.