I'm trying to load a local tokenizer using;
from transformers import RobertaTokenizerFast
tokenizer = RobertaTokenizerFast.from_pretrained(r'file path\tokenizer')
however, this gives me the following error;
OSError: Can't load tokenizer for 'file path\tokenizer'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'file path\tokenizer' is the correct path to a directory containing all relevant files for a RobertaTokenizerFast tokenizer.
The file directory contains both a merges.txt and vocab.json file for the tokenizer, so I am not sure to how to resolve the issue.
Appreciate any help!
You need to point to the directory that contains those files, not one of those specific files. So get the path to that directory and define the
tokenizeras you've done above, but with the path between''in place ofr'file path\tokenizer'.RobertaTokenizerFastexpects to findvocab.json,merges.txt, andtokenizer.jsonin that directory, so make sure you have downloaded everything it requires. Note that you may also individually point to these files by passing the argumentsvocab_file,merges_file, andtokenizer_file. See the docs for further information.