Vilt Model causing RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) in the new update.
I have placed the model already on the GPU then running the below code
My same old code is running fine on other envs, but in my current env I newly installed the huggingface transformers library since then I’m facing a lot of issues in the same codes.
Any insights would be helpful, I checked on SO the solutions for this on other models but none helped, so raising a new question.
Here is the finetuning code I have:
optimizer = torch.optim.AdamW(model.parameters(), lr = 5e-5)
torch.set_grad_enabled(True) # Context-manager
model.train()
epochList, accList = [],[]
for epoch in tqdm(range(20)):
print(f"Epoch: {epoch}")
for idx, batch in enumerate(train_dataloader):
batch = {k:v.to(device) for k,v in batch.items()}
optimizer.zero_grad()
outputs = model(**batch)
loss = outputs.loss
print(idx,"-> Loss:", loss.item())
loss.backward()
optimizer.step()
if (idx != 0 ) and (idx % 200 == 0):
model.eval()
acc_score_test = calculateAccuracyTest()
acc_score_val = calculateAccuracyVal()
print(f'\nValidation Accuracy: {acc_score_val}, Test Accuracy: {acc_score_test} \n')
epochList.append((epoch*tot_number_of_steps)+idx)
accList.append((acc_score_test,acc_score_val))
model.train()
The stack Trace is huge, I'm bottom trace is:
~/miniconda3/envs/yolo/lib/python3.11/site-packages/transformers/models/vilt/modeling_vilt.py:219, in ViltEmbeddings.forward(self, input_ids, attention_mask, token_type_ids, pixel_values, pixel_mask, inputs_embeds, image_embeds, image_token_type_idx)
217 # PART 2: patch embeddings (with interpolated position encodings)
218 if image_embeds is None:
--> 219 image_embeds, image_masks, patch_index = self.visual_embed(
220 pixel_values, pixel_mask, max_image_length=self.config.max_image_length
221 )
222 else:
223 image_masks = pixel_mask.flatten(1)
~/miniconda3/envs/yolo/lib/python3.11/site-packages/transformers/models/vilt/modeling_vilt.py:186, in ViltEmbeddings.visual_embed(self, pixel_values, pixel_mask, max_image_length)
184 x = x[select[:, 0], select[:, 1]].view(batch_size, -1, num_channels)
185 x_mask = x_mask[select[:, 0], select[:, 1]].view(batch_size, -1)
--> 186 patch_index = patch_index[select[:, 0], select[:, 1]].view(batch_size, -1, 2)
187 pos_embed = pos_embed[select[:, 0], select[:, 1]].view(batch_size, -1, num_channels)
189 cls_tokens = self.cls_token.expand(batch_size, -1, -1)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)