I tried to load the pre-training parameters trained by a single GPU on a single machine with multiple GPUs, but errors such as Missing keys and Unexpected keys occurred.
backbone_cfg = dict(
embed_dim=embed_dim,
depths=depths,
num_heads=num_heads,
window_size=window_size,
ape=False,
drop_path_rate=0.3,
patch_norm=True,
use_checkpoint=False,
frozen_stages=frozen_stages
)
self.backbone = SwinTransformer(**backbone_cfg)
def init_weights_multiGPUs(self, pretrained=None):
if pretrained is not None:
if dist.get_rank() == 0:
self.backbone.load_state_dict(torch.load(pretrained))
dist.barrier()
pretrained is a pre-training parameters path.
I have tried the methods mentioned below but the problem is still the same
def init_weights_multiGPUs(self, pretrained = None) :
print(f'== Load encoder backbone on multiGPUs from: {pretrained}')
if isinstance(self.backbone, torch.nn.parallel.DistributedDataParallel):
self.backbone = self.backbone.module
self.backbone.load_state_dict(torch.load(pretrained, map_location='cuda:{}'.format(torch.cuda.current_device())))
Missing keys error means your model architecture is trying to find weights that don't exist in the file you are loading. Unexpected keys means the file you are loading contains keys the model isn't looking for. The problem is likely that your model architecture changed at some point after saving weights, and some of your state dict keys got renamed.