I used a Hugging face tokenizer and encoder and preprocessed the data, and now I want to use Fairseq's transformer model for the translation task, but I don't have a dict.txt. What should I do, please
Can only give input and output data fit, or how to make dict.txt
The
dict.txtfile is attached within the pre-trained model. For transformer models, see Pre-trained modelsDownloading, and extracting
transformer_lm.wmt19.engives the following file structureAlso from the docs, the model uses Byte Pair Encoding (BPE). So it you want to train a new model, you might need to pre-process the text first