BertTokenizer.save_pretrained("OUTPUT_DIR")
saves vocab.txt, special_tokens_map.json and tokenizer_config.json to my output directory, the available train model is stored as pytorch_model.bin and the config is also there.
How is it possible to use these for inference, preferably with torchtext?
In order to perform inference, you have to load the tokenizer and model again as follows (here I assume that the model you trained was
BertForSequenceClassification
):With "path_to_directory" being a string, for example
"./model"
(in case your directory is called "model", and you're currently in the parent directory of it). The tokenizer and model automatically infer which files they need from the directory. The tokenizer will use the vocab.txt file, the model will use the config.json file to set its hyperparameters as well as the pytorch_model.bin file to load the pretrained weights. You just have to make sure all these files are located in that directory.Do you know how to provide a new sentence to a model like BERT? I'm not familiar with TorchText, but you can perform inference as follows:
The tokenizer will transform the sentence into a format that BERT understands (i.e. input ids, token type ids and so on as PyTorch tensors), including padding and truncation. The
outputs
variable is a Python tuple, containing the raw logits.