Is there a way for inference with torchtext using from_pretrained() transformers method?

353 Views Asked by At
BertTokenizer.save_pretrained("OUTPUT_DIR")

saves vocab.txt, special_tokens_map.json and tokenizer_config.json to my output directory, the available train model is stored as pytorch_model.bin and the config is also there.

How is it possible to use these for inference, preferably with torchtext?

1

There are 1 best solutions below

2
On

In order to perform inference, you have to load the tokenizer and model again as follows (here I assume that the model you trained was BertForSequenceClassification):

tokenizer = BertTokenizer.from_pretrained("path_to_directory")
model = BertForSequenceClassification.from_pretrained("path_to_directory")

With "path_to_directory" being a string, for example "./model" (in case your directory is called "model", and you're currently in the parent directory of it). The tokenizer and model automatically infer which files they need from the directory. The tokenizer will use the vocab.txt file, the model will use the config.json file to set its hyperparameters as well as the pytorch_model.bin file to load the pretrained weights. You just have to make sure all these files are located in that directory.

Do you know how to provide a new sentence to a model like BERT? I'm not familiar with TorchText, but you can perform inference as follows:

sentence = "This is a new sentence"
inputs = tokenizer(sentence, padding='max_length', truncation=True, return_tensors="pt")
outputs = model(**inputs)

The tokenizer will transform the sentence into a format that BERT understands (i.e. input ids, token type ids and so on as PyTorch tensors), including padding and truncation. The outputs variable is a Python tuple, containing the raw logits.