system: MacOSX 14.4.1
python: 3.11.8
mpi4py: 3.1.5
OpenMPI: 5.0.2 installed with homebrew
I have the following python script. The deadlock happens when I have 3 MPI ranks. I want to create a subcommunicator for ranks 0 and 1, and another for ranks 1 and 2. This means that rank 1 only needs to know about the first sub communicator, rank 2 needs to know about the second, but rank 1 needs to know about both of them.
However, although rank 1 successfully creates the first communicator, it deadlocks when creating the second. Why is that?
from mpi4py import MPI
from typing import List
world_comm = MPI.COMM_WORLD
world_size = world_comm.Get_size()
world_rank = world_comm.Get_rank()
assert world_size == 3
communicator_ranks = [
[(0, 1)],
[(0, 1), (1, 2)],
[(1, 2)]
]
communicators: List[MPI.Comm] = []
for ranks in communicator_ranks[world_rank]:
group = world_comm.group.Incl(ranks)
print(f"rank: {world_rank}, forming communictor for ranks: {ranks}")
comm = world_comm.Create(group)
print(f"rank: {world_rank}, forming communictor for ranks: {ranks} -- DONE")
communicators.append(comm)
world_comm.barrier()
print("all done")
when running mpirun -n 3 python deadlock.py I get the following printout:
> mpirun -n 3 python deadlock.py
rank: 2, forming communictor for ranks: (1, 2)
rank: 1, forming communictor for ranks: (0, 1)
rank: 0, forming communictor for ranks: (0, 1)
rank: 0, forming communictor for ranks: (0, 1) -- DONE
rank: 2, forming communictor for ranks: (1, 2) -- DONE
rank: 1, forming communictor for ranks: (0, 1) -- DONE
rank: 1, forming communictor for ranks: (1, 2)
Note, the last entry has no DONE counterpart and the program just sits there waiting forever
Your program is incorrect w.r.t. the MPI standard.
From MPI 4.2 chapter 7.4 page 325:
You might try the
mpi4pybinding forMPI_Comm_create_from_group()instead, or useMPI_GROUP_EMPTY