I am trying to build a Process subclass to utilize multiple GPUs in my desktop.
class GPUProcess(mp.Process):
used_ids: list[int] = []
next_id: int = 0
def __init__(self, *, target: Callable[[Any], Any], kwargs: Any):
gpu_id = GPUProcess.next_id
if gpu_id in GPUProcess.used_ids:
raise RuntimeError(
f"Attempt to reserve reserved processor {gpu_id} {self.used_ids=}"
)
GPUProcess.next_id += 1
GPUProcess.used_ids.append(gpu_id)
self._gpu_id = gpu_id
# Define target process func with contant gpu_id
def _target(**_target_kwargs):
target(
**_target_kwargs,
gpu_id=self.gpu_id,
)
super(GPUProcess, self).__init__(target=_target, kwargs=kwargs)
@property
def gpu_id(self):
return self._gpu_id
def __del__(self):
GPUProcess.used_ids.remove(self.gpu_id)
def __repr__(self) -> str:
return f"<{type(self)} gpu_id={self.gpu_id} hash={hash(self)}>"
# Test creation
def test_process_creation():
# Expect two gpus
def dummy_func(*args):
return args
processes = []
for _ in range(2):
p = GPUProcess(
target=dummy_func,
kwargs=dict(a=("a", "b", "c")),
)
processes.append(p)
for p in processes:
p.start()
for p in processes:
p.join()
del processes
assert GPUProcess.used_ids == [], f"{GPUProcess.used_ids=}!=[]"
if __name__ == "__main__":
test_process_creation()
__del__ is not called for the second process.
AssertionError: GPUProcess.used_ids=[1]!=[]
Why is the second __del__ not called?
Later, I'd utilize this class with mp.Pool to run a large set of payloads using one GPUProcess per my GPU and a function that uses gpu_id keyword to decide utilized device. Is this even sensible approach in Python?
The short answer is that
__del__is not being called because the variablepfrom the previous for loop is still referencing the second process objectobject.__del__is not guaranteed to be called whendel objectis called, as per the official documentation.object.__del__is called when the reference count of the object is 0.The
delkeyword reduces the reference count of the object by 1.So you could resolve this by setting
p = Noneordel pbefore calling the assertion to remove that reference, or call the assertion after exiting thetest_process_creation()function, as that will remove all the reference counts from that stack level.I found this video from the mCoding channel to be very informative about
__del__and how NOT to use it.As an aside, I will note that while testing your code with those changes I found a couple of issues you will need to resolve:
ValueError: list.remove(x): x not in list. Your__del__()method should either use a try/except, or check if self.gpu_id is in the list before removing it.test_process_creation()you defined the dummy function asdummy_func(*args), but the test processes you created specifically have keyword arguments defined withkwargs=dict(a=("a", "b", "c")). This will cause the exceptionTypeError: test_process_creation.<locals>.dummy_func() got an unexpected keyword argument 'a'. The simplest solution would be to change the definition todummy_func(**kwargs)