Load pre-training parameters trained on a single GPU on multi GPUS on a single machine

27 Views Asked by Mingshuai Zhao At 18 March 2024 at 13:51

I tried to load the pre-training parameters trained by a single GPU on a single machine with multiple GPUs, but errors such as Missing keys and Unexpected keys occurred.

backbone_cfg = dict(
            embed_dim=embed_dim,
            depths=depths,  
            num_heads=num_heads, 
            window_size=window_size,
            ape=False,
            drop_path_rate=0.3,
            patch_norm=True,
            use_checkpoint=False,
            frozen_stages=frozen_stages
        )

        self.backbone = SwinTransformer(**backbone_cfg)

def init_weights_multiGPUs(self, pretrained=None):
             if pretrained is not None:
                if dist.get_rank() == 0:
                    self.backbone.load_state_dict(torch.load(pretrained))
                dist.barrier()

pretrained is a pre-training parameters path.

I have tried the methods mentioned below but the problem is still the same

def init_weights_multiGPUs(self, pretrained = None) :
        print(f'== Load encoder backbone on multiGPUs from: {pretrained}')
        if isinstance(self.backbone, torch.nn.parallel.DistributedDataParallel):
            self.backbone = self.backbone.module
        self.backbone.load_state_dict(torch.load(pretrained, map_location='cuda:{}'.format(torch.cuda.current_device())))

Original Q&A

There are 1 best solutions below

Karl On 18 March 2024 at 18:18

Missing keys error means your model architecture is trying to find weights that don't exist in the file you are loading. Unexpected keys means the file you are loading contains keys the model isn't looking for. The problem is likely that your model architecture changed at some point after saving weights, and some of your state dict keys got renamed.

Load pre-training parameters trained on a single GPU on multi GPUS on a single machine

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in DISTRIBUTED-COMPUTING

Related Questions in DISTRIBUTED-DATABASE

Trending Questions

Popular # Hahtags

Popular Questions