Loading local tokenizer

1.2k Views Asked by At

I'm trying to load a local tokenizer using;

from transformers import RobertaTokenizerFast

tokenizer = RobertaTokenizerFast.from_pretrained(r'file path\tokenizer')

however, this gives me the following error;

OSError: Can't load tokenizer for 'file path\tokenizer'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'file path\tokenizer' is the correct path to a directory containing all relevant files for a RobertaTokenizerFast tokenizer.

The file directory contains both a merges.txt and vocab.json file for the tokenizer, so I am not sure to how to resolve the issue.

Appreciate any help!

1

There are 1 best solutions below

3
doine On

You need to point to the directory that contains those files, not one of those specific files. So get the path to that directory and define the tokenizer as you've done above, but with the path between '' in place of r'file path\tokenizer'.

tokenizer = RobertaTokenizerFast.from_pretrained('path_to_directory')

RobertaTokenizerFast expects to find vocab.json, merges.txt, and tokenizer.json in that directory, so make sure you have downloaded everything it requires. Note that you may also individually point to these files by passing the arguments vocab_file, merges_file, and tokenizer_file. See the docs for further information.