Loading local tokenizer

1.2k Views Asked by Jon At 03 June 2023 at 08:36

I'm trying to load a local tokenizer using;

from transformers import RobertaTokenizerFast

tokenizer = RobertaTokenizerFast.from_pretrained(r'file path\tokenizer')

however, this gives me the following error;

OSError: Can't load tokenizer for 'file path\tokenizer'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'file path\tokenizer' is the correct path to a directory containing all relevant files for a RobertaTokenizerFast tokenizer.

The file directory contains both a merges.txt and vocab.json file for the tokenizer, so I am not sure to how to resolve the issue.

Appreciate any help!

Original Q&A

There are 1 best solutions below

doine On 03 June 2023 at 09:06

You need to point to the directory that contains those files, not one of those specific files. So get the path to that directory and define the tokenizer as you've done above, but with the path between '' in place of r'file path\tokenizer'.

tokenizer = RobertaTokenizerFast.from_pretrained('path_to_directory')

RobertaTokenizerFast expects to find vocab.json, merges.txt, and tokenizer.json in that directory, so make sure you have downloaded everything it requires. Note that you may also individually point to these files by passing the arguments vocab_file, merges_file, and tokenizer_file. See the docs for further information.

Loading local tokenizer

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in TOKENIZE

Related Questions in BERT-LANGUAGE-MODEL

Related Questions in ROBERTA-LANGUAGE-MODEL

Trending Questions

Popular # Hahtags

Popular Questions