Flink cleanup after job cancellation

28 Views Asked by At

I have an integration test suite that I am running against a Flink application.

These set of test validates my application for different combinations of Windowing / event triggers / processing triggers / lateness etc.

On a remote Flink server running in Kubernetes environment, listens to messages on Kafka topic, and feeds the output to another Kafka topic. I run these tests. As part of the test, I submit a new job, send some messages on Kafka topic, and asserting the result on the output Kafka topic, and cancel the job.

This process repeats for every test case. But in every test case, I ensure every Flink job has its own input and output Kafka topic. Also, I am not bouncing the application post each test case.

The behavior I have noticed is that these test cases succeed when run as an individual. When I run them in the suite, some test cases break intermittently.

To ensure proper ordering of messages:

  1. I have set parallelism as 1 for flink job.
  2. Ensured synchronous messages in Kafka topic.
  3. Kafka Producer is sending messages to partition-0 only.

Can it be the residue of the previous test that can break the next test? Or some trigger of the previous test hindering the next test. If so, what can be done as part of the cleanup?

Edit: Just an FYI, Flink Window aggregator is pretty simple. It just makes sum of integer provided as inputs.

0

There are 0 best solutions below