Setting seed while training many PyTorch models in a loop?

68 Views Asked by At

I have code as follows:

seed = 0
random.seed(seed)
torch.manual_seed(seed)
np.random.seed(seed)
torch.cuda.manual_seed_all(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False


def train_model(dataloaders, model, optimizer, num_epochs=100): 
    ...

def test_model(model,test_loader,device):
    ...

for i in range(2):
    model =  Transformer(d_model=d_model, 
                    n_head=n_head, 
                    max_len=max_len, 
                    seq_len=sequence_len, 
                    ffn_hidden=ffn_hidden, 
                    n_layers=n_layer, 
                    drop_prob=drop_prob, 
                    details=False, 
                    device=device).to(device=device)
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    train_model(...)
    torch.save(model.state_dict())
    del model
    torch.cuda.empty_cache()
    # removed from cache, now load for testing
    model = Transformer(...)
    model.load_state_dict(...)
    test_model()

For some reason, the training loss, final accuracy of the 2 models arent the same even when the seed is set initially. And even when testing, the results don't seem to be same. To get the same results while training and testing, I need to put the seed setting into the for loop before the training and testing as follows:

def train_model(dataloaders, model, optimizer, num_epochs=100): 
    ...

def test_model(model,test_loader,device):
    ...

for i in range(2):
    seed = 0
    random.seed(seed)
    torch.manual_seed(seed)
    np.random.seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    model =  Transformer(d_model=d_model, 
                    n_head=n_head, 
                    max_len=max_len, 
                    seq_len=sequence_len, 
                    ffn_hidden=ffn_hidden, 
                    n_layers=n_layer, 
                    drop_prob=drop_prob, 
                    details=False, 
                    device=device).to(device=device)
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    train_model(...)
    torch.save(model.state_dict())
    del model
    torch.cuda.empty_cache()
    # removed from cache, now load for testing
    seed = 0
    random.seed(seed)
    torch.manual_seed(seed)
    np.random.seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    model = Transformer(...)
    model.load_state_dict(...)
    test_model()

This helps keep the iterations consistent with respect to the training losses, accuracies as well as testing between the two models. Why is this? Is resetting the seed helping it go back to initial states? Is training one model after setting seed altering the conditions somehow? I'm just asking because I run experiments like for num_epochs in [30, 40, 60, 80, 100] and keeping a seed really helps with reproducible results.

1

There are 1 best solutions below

0
Yakov Dan On

Well, assuming drop_prob is non-zero, your transformer model has dropout layers. Dropout layers randomly mask some attention and feed-forward blocks. Since different blocks will be chosen in both models, the result will be different. Resetting the seed for each model ensures that the same blocks in both models get masked.

There may be other factors that depend on the state of the random numbers generator, but this one pops out immediately.