Moving tensors between GPUs in Lightning

47 Views Asked by At

How can I move tensors from one gpu to another in training_step of pl.LightningModule?

In torch-based pipeline I use the following function to move tensors during multi-gpu fitting procedure:

def neighbour_exchange_bidir(left_rank, right_rank, tensor_to_left, tensor_to_right, group=None):
    tensor_from_left = torch.zeros_like(tensor_to_right)
    tensor_from_right = torch.zeros_like(tensor_to_left)
    send_op_left = torch.distributed.P2POp(
        torch.distributed.isend,
        tensor_to_left,
        left_rank,
        group=group,
    )
    send_op_right = torch.distributed.P2POp(
        torch.distributed.isend,
        tensor_to_right,
        right_rank,
        group=group,
    )
    recv_op_left = torch.distributed.P2POp(
        torch.distributed.irecv,
        tensor_from_left,
        left_rank,
        group=group,
    )
    recv_op_right = torch.distributed.P2POp(
        torch.distributed.irecv,
        tensor_from_right,
        right_rank,
        group=group,
    )
    reqs = torch.distributed.batch_isend_irecv([send_op_right, send_op_left, recv_op_right, recv_op_left])
    for req in reqs:
        req.wait()
    return tensor_from_right, tensor_from_left

However, its not clear how to use it in Lightning as I need to get devices' ids to move tensors and I found no examples of using torch.distributed.P2POp with Lightning.

1

There are 1 best solutions below

0
The Hidden Reverse On

Looks like self.trainer.local_rank, self.trainer.global_rank, self.trainer.world_size may help, going to reply myself as soon as I try it