How to submit Flink batch job requests per customer on Amazon EKS using S3 buckets for source and sink?

69 Views Asked by At

I am new to Kubernetes and Flink for some batch processing. I'd like to setup a Flink Job on EKS and I have about 2.5 TB of data that needs some aggregations performed every 30 minutes. (Overall, intend to process 120 TB of data per day from several IoT devices). This data can be partitioned by different customers (~5000 customers).

How can I submit a batch job request to flink cluster per customer, where the source is an S3 bucket that is already partitioned by customers and the sink is also an s3 bucket that has the aggregated customer data?

Can I use the RestClusterClient for this purpose? Or can I build a Flink Client as a separate POD in the Flink Cluster that can submit the jobs based out of some trigger (EventBridge/SQS?)

0

There are 0 best solutions below