Beam errors out when using PortableRunner (Flink Runner) – Cannot run program "docker"

29 Views Asked by At

I am trying to deploy a Beam job (Python Beam) that runs on a PortableRunner (Flink Runner) in my Kubernetes cluster. I have not experienced issues prior with Beam using the Flink Runner. However, today I tried to set up Beam to be a consumer from Apache Kafka using ReadFromKafka from apache_beam.io.kafka.

My Flink Cluster is managed by the Apache Flink Kubernetes Operator.

My Beam jobs are managed by a Beam Flink Job Manager, which posts Beam jobs to the Flink master. The Job Manager uses the image apache/beam_flink1.16_job_server:2.54.0.

My Flink Task Managers each contain a sidecar for a Beam worker pool, which is spun up using the image apache/beam_python3.11_sdk:2.54.0 and the arg --worker_pool.

When I start my beam job, I get the following error on the job manager logs:

Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory

These are my Beam pipeline options:

--job_name=beam_example_pipeline
--runner=PortableRunner
--job_endpoint=beam-flink-job-server:8099
--artifact_endpoint=beam-flink-job-server:8098
--environment_type=EXTERNAL
--environment_config=localhost:50000
--parallelism=1
--streaming

Some resources I've found suggest that the Kafka transform has its own environment type which is set to (and overrides any environment you set?) --environment_type=DOCKER, which is what causes the issues. However, I could be wrong, so please say so if I am.

All of this taking place on a Kubernetes cluster, where, to my knowledge, Docker in Docker is not recommended. I do not want to use a PROCESS environment_type, I require EXTERNAL.

Additionally, in my search for potential fixes, apache_beam.io.kafka states that Flink users can use the Beam Flink Job Server expansion service. However, even though the expansion service does exist, beam jobs with Kafka transforms refuse to connect to the expansion service:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:<my-job-server-ip>:8097: Failed to connect to remote host: Connection refused"
debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:<my-job-server-ip>:8097: Failed to connect to remote host: Connection refused", grpc_status:14, created_time:"2024-03-20T06:47:52.593232663+00:00"}"

How can I resolve this issue? Is this a bug with Beam?

0

There are 0 best solutions below