Fairseq without dictionary

272 Views Asked by At

I used a Hugging face tokenizer and encoder and preprocessed the data, and now I want to use Fairseq's transformer model for the translation task, but I don't have a dict.txt. What should I do, please

Can only give input and output data fit, or how to make dict.txt

1

There are 1 best solutions below

0
Wakeme UpNow On

The dict.txt file is attached within the pre-trained model. For transformer models, see Pre-trained models

Downloading, and extracting transformer_lm.wmt19.en gives the following file structure

wmt19.en
|- bpecodes
|- dict.txt
|- model.pt

Also from the docs, the model uses Byte Pair Encoding (BPE). So it you want to train a new model, you might need to pre-process the text first