JSONDecodeError and FileNotFoundError When Loading word2vec-google-news-300 Model with Gensim in Python

64 Views Asked by At

I am trying to load the word2vec-google-news-300 model using Gensim in Python, but I am encountering a FileNotFoundError followed by a JSONDecodeError. The errors occur when Gensim attempts to use its downloader to fetch the model. Here's the relevant part of my code:

import gensim.downloader as api

model = api.load("word2vec-google-news-300")

The first error message: FileNotFoundError: [Errno 2] No such file or directory: '/Users/myusername/gensim-data/information.json'

The second error message: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Troubleshooting steps

  1. Confirmed the '/Users/myusername/gensim-data' directory does indeed exist and made sure it has the right read/write/exec permissions (rwxrwx-r)
  2. Created an empty 'information.json' file inside the gensim-data folder (incase Gensim would recognize it)
  3. Cleared gensim cache by deleting the gensim-data directory, and then running the script again so that Gensim would recreate the directory
  4. Reinstalled Gensim with pip uninstall gensim and then pip install gensim

Despite all this the problem persists :(... I'm running Python 3.11 in VSCode and Gensim 4.3.2 on a Mac M1.

Trying to finish an assignment for my AI course, but this issue's taken up so much time, so I really appreciate any help or insight on how I can fix this, thank you!!

EDIT

I alternatively downloaded the model manually from GoogleNews-vectors-negative300.bin.gz and changed the model usage to instead use the downloaded version, using KeyedVectors instead of gensim.downloader as api:

from gensim.models import KeyedVectors

model_path = '/Users/myusername/Downloads/GoogleNews-vectors-negative300.bin.gz'  
model = KeyedVectors.load_word2vec_format(model_path, binary=True)

This seems to have worked, but the assignment requires me to 'First, use gensim.downloader.load to load the word2vec-google-news-300 pretrained embedding model', so even though this workaround works I still need to figure out how to instead use the load method

0

There are 0 best solutions below