I am writing some unit tests for a python program that requires the background execution of a port-forwarding command to query a service.
The program is a Kubernetes operator written in python using Kopf, thus I'm using pytest. The operator regularly queries the ingress controller metrics server. When ran within the cluster, it has access to the service. However unit tests are ran from outside so they require the following command to run in the background
kubectl port-forward svc/metrics 10254:80 -n ingress-nginx
I browsed a bit of documentation, implemented a solution using subprocess.Popen, and wrote a nice decorator:
CMD = ['kubectl', 'port-forward', 'svc/metrics', '10254:80', '-n', 'ingress-nginx']
def expose_metrics_server(func):
def wrapper():
with subprocess.Popen(CMD, shell=False) as proc:
func()
proc.terminate()
return wrapper
So far so good, right ? However, my unit tests that assessed error handling when the metrics server is not up were failing.
I investigated and noticed the following behavior
- before
proc.terminate()
(venv) user@debian:~/$ ps aux | grep kubectl
user 182770 0.0 0.0 6932 3348 ? Ss 14:49 0:00 /bin/bash /usr/local/bin/kubectl port-forward svc/metrics 10254:80 -n ingress-nginx
user 182774 0.5 0.3 1809452 58680 ? Sl 14:49 0:00 minikube kubectl -- port-forward svc/metrics 10254:80 -n ingress-nginx
user 182816 1.1 0.2 762652 42432 ? Sl 14:49 0:00 /home/user/.minikube/cache/linux/amd64/v1.27.4/kubectl port-forward svc/metrics 10254:80 --cluster=minikube -n ingress-nginx
user 183312 0.0 0.0 6332 2044 pts/4 S+ 14:49 0:00 grep kubectl
- after
proc.terminate()
(venv) user@debian:~/$ ps aux | grep kubectl
user 182770 0.0 0.0 0 0 ? Zs 14:49 0:00 [kubectl] <defunct>
user 182774 0.1 0.3 1809452 58680 ? Sl 14:49 0:00 minikube kubectl -- port-forward svc/metrics 10254:80 -n ingress-nginx
user 182816 0.2 0.2 762652 42432 ? Sl 14:49 0:00 /home/user/.minikube/cache/linux/amd64/v1.27.4/kubectl port-forward svc/metrics 10254:80 --cluster=minikube -n ingress-nginx
user 183732 0.0 0.0 6332 2108 pts/4 S+ 14:50 0:00 grep kubectl
- after test exited
(venv) user@debian:~/$ ps aux | grep kubectl
user 182774 0.1 0.3 1809452 58680 ? Sl 14:49 0:00 minikube kubectl -- port-forward svc/metrics 10254:80 -n ingress-nginx
user 182816 0.2 0.2 762652 42432 ? Sl 14:49 0:00 /home/user/.minikube/cache/linux/amd64/v1.27.4/kubectl port-forward svc/metrics 10254:80 --cluster=minikube -n ingress-nginx
user 183793 0.0 0.0 6332 2152 pts/4 S+ 14:50 0:00 grep kubectl
So basically the top process is being terminated but not its children. Which according to my research is way less trivial than it sounds.
So I tried various solutions like preexec_fn=os.setsid/start_new_session=True flags in the Popen command, or killing through various combination of tools provided by os and signals libraries. E.g. naive os.kill(proc.pid, signal.SIGKILL) or the more refined os.killpg(os.getpgid(proc.pid), signal.SIGTERM). Running it through os.fork or threadding.Thread.
Most of those attempts did not changed anything, some resulted in infinite pending during deletion and I even managed to kill my X server once. I initially had set Shell=True, and setting it to false did not helped with termination, but removed /bin/sh -c top call.
Some recommend to use psutil but I'd like to stick with python base modules if possible.
"Working solution"
In the end I obtained the desired behavior through the sid + calling another shell to terminate since there's no os.killsession method
with Popen(CMD, shell=False, start_new_session=True) as proc:
func()
call(['pkill', '-9', '-s', str(os.getsid(proc.pid))])
However, calling pkill -9 like that sounds rather extreme and I have the intuition there should be a cleaner way to do this.
So I'm making this post to know if there are recommended patterns to run a shell command in the background alongside a python script then properly terminate it with all its children.
Thank you in advance for all input or hindsight.