I was looking at the basic implementation of DDP:
class ToyModel(nn.Module):
def __init__(self):
super(ToyModel, self).__init__()
self.net1 = nn.Linear(10, 10)
self.relu = nn.ReLU()
self.net2 = nn.Linear(10, 5)
def forward(self, x):
return self.net2(self.relu(self.net1(x)))
def demo_basic(rank, world_size):
print(f"Running basic DDP example on rank {rank}.")
setup(rank, world_size)
# create model and move it to GPU with id rank
model = ToyModel().to(rank)
ddp_model = DDP(model, device_ids=[rank])
loss_fn = nn.MSELoss()
optimizer = optim.SGD(ddp_model.parameters(), lr=0.001)
optimizer.zero_grad()
outputs = ddp_model(torch.randn(20, 10))
labels = torch.randn(20, 5).to(rank)
loss_fn(outputs, labels).backward()
optimizer.step()
cleanup()
def run_demo(demo_fn, world_size):
mp.spawn(demo_fn,
args=(world_size,),
nprocs=world_size,
join=True)
Just wondering how PyTorch knows which GPU to put the model on just based off of rank? Usually we specify a torch.device() object to a model. How does Pytorch interpret it when the to() function is provided an integer?
By default, if an integer
iis provided as an argument totorch.Tensor.to, it will consider thei-th cuda device. Here is a test:Which means
.to(0)will be same as.to('cuda:0'),to(torch.device('cuda')), or even.cuda(), which defaults to the first device ie.cuda:0.