I have a question regarding HPA in Kubernetes. Below is my HPA: it says always to keep a minimum replica of 3 and scale up when CPU utilization is over 50%.
kubectl autoscale deployment superset -n superset --cpu-percent=50 --min=3 --max=20
Though I have three pods up and running constantly, HPA brings up a new pod when I send any requests (which I believe can still be handled by the running three pods). My question is, why HPA have to bring new pods when existing pods can satisfy the load?
Is something wrong with my hpa configuration?
Your configuration looks correct, and generally I'd say you're right that 1 request should not cause a scale-up. So there are a couple possible reasons I can think of:
Your CPU request on the deployment is set far too low. The HPA's scaling algorithm is based on
CPU used / CPU request, not CPU limit, so if you set the CPU request to something really tiny (e.g.1m) then any activity at all could cause a scale-up.The request in question is causing a large CPU spike, or possibly crashing the pod. I think the behavior in the latter case here would be different than what you're describing, but it's worth monitoring the resources and logs of the pod receiving the request to make sure nothing looks amiss.
Overall, in this situation, I'd try to get ahold of whatever observability tools you have and try to gauge if the metrics reported by the pods match up with a need for scaling, or if any other weirdness is going on (container restarts, etc).