Fairseq without dictionary

272 Views Asked by rgy k At 19 February 2023 at 02:31

I used a Hugging face tokenizer and encoder and preprocessed the data, and now I want to use Fairseq's transformer model for the translation task, but I don't have a dict.txt. What should I do, please

Can only give input and output data fit, or how to make dict.txt

Original Q&A

There are 1 best solutions below

Wakeme UpNow On 19 February 2023 at 03:41

The dict.txt file is attached within the pre-trained model. For transformer models, see Pre-trained models

Downloading, and extracting transformer_lm.wmt19.en gives the following file structure

wmt19.en
|- bpecodes
|- dict.txt
|- model.pt

Also from the docs, the model uses Byte Pair Encoding (BPE). So it you want to train a new model, you might need to pre-process the text first

Fairseq without dictionary

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in FAIRSEQ

Trending Questions

Popular # Hahtags

Popular Questions