I am deploying the Flink stateful app using the below-mentioned YAML file.
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: simple-flink
spec:
image: flink-1.17-python-iceberg:1.17
flinkVersion: v1_16
ingress:
template: "{{name}}.{{namespace}}.flink.k8s.io"
className: "nginx"
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: 50m
flinkConfiguration:
taskmanager.numberOfTaskSlots: "1"
state.savepoints.dir: file:///flink-data/savepoints
state.checkpoints.dir: file:///flink-data/checkpoints
high-availability.type: kubernetes
high-availability.storageDir: file:///flink-data/ha
rest.client-max-content-length: "1004857600"
serviceAccount: flink
jobManager:
replicas: 1
resource:
memory: "2048m"
cpu: 1
taskManager:
replicas: 1
resource:
memory: "2048m"
cpu: 1
podTemplate:
spec:
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /flink-data
name: flink-volume
env:
- name: HADOOP_CONF_DIR
value: "/opt/hadoop-2.8.5/etc/hadoop:/opt/hadoop-2.8.5/share/hadoop/common/lib/*:/opt/hadoop-2.8.5/share/hadoop/common/*:/opt/hadoop-2.8.5/share/hadoop/hdfs:/opt/hadoop-2.8.5/share/hadoop/hdfs/lib/*:/opt/hadoop-2.8.5/share/hadoop/hdfs/*:/opt/hadoop-2.8.5/share/hadoop/yarn/lib/*:/opt/hadoop-2.8.5/share/hadoop/yarn/*:/opt/hadoop-2.8.5/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.8.5/share/hadoop/mapreduce/*:/opt/hadoop-2.8.5/contrib/capacity-scheduler/*.jar"
volumes:
- name: flink-volume
hostPath:
path: /tmp
type: Directory`
Flink Jobs are running perfectly. For auto-scaling I created HPA using the following code.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: simple-flink
namespace: default
spec:
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageValue: 10Mi
scaleTargetRef:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
name: simple-flink
While describing the auto scaling I am getting below mentioned error.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedComputeMetricsReplicas 6m23s (x12 over 9m8s) horizontal-pod-autoscaler selector is required
Warning SelectorRequired 4m8s (x21 over 9m8s) horizontal-pod-autoscaler selector is required`
And when doing a "kubectl describe hpa simple-flink" I get the following status info:
status:
conditions:
- lastTransitionTime: "2023-12-19T13:42:00Z"
message: the HPA controller was able to get the target's current scale
reason: SucceededGetScale
status: "True"
type: AbleToScale
- lastTransitionTime: "2023-12-19T13:42:00Z"
message: the HPA target's scale is missing a selector
reason: InvalidSelector
status: "False"
type: ScalingActive`
I've tried as suggested in this other thread: https://stackoverflow.com/questions/73075996/flink-kubernetes-deployment-the-hpa-controller-was-unable-to-get-the-targets to execute the following to update the CRD to the last version:
git clone https://github.com/apache/flink-kubernetes-operator
cd flink-kubernetes-operator
kubectl replace -f helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml
After that I recreated the deployment and the HPA but I get the same error.
Thanks a lot for any suggestion on how to fix this problem