I am trying to load the word2vec-google-news-300 model using Gensim in Python, but I am encountering a FileNotFoundError followed by a JSONDecodeError. The errors occur when Gensim attempts to use its downloader to fetch the model. Here's the relevant part of my code:
import gensim.downloader as api
model = api.load("word2vec-google-news-300")
The first error message:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/myusername/gensim-data/information.json'
The second error message:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Troubleshooting steps
- Confirmed the '/Users/myusername/gensim-data' directory does indeed exist and made sure it has the right read/write/exec permissions (rwxrwx-r)
- Created an empty 'information.json' file inside the gensim-data folder (incase Gensim would recognize it)
- Cleared gensim cache by deleting the gensim-data directory, and then running the script again so that Gensim would recreate the directory
- Reinstalled Gensim with pip uninstall gensim and then pip install gensim
Despite all this the problem persists :(... I'm running Python 3.11 in VSCode and Gensim 4.3.2 on a Mac M1.
Trying to finish an assignment for my AI course, but this issue's taken up so much time, so I really appreciate any help or insight on how I can fix this, thank you!!
EDIT
I alternatively downloaded the model manually from GoogleNews-vectors-negative300.bin.gz and changed the model usage to instead use the downloaded version, using KeyedVectors instead of gensim.downloader as api:
from gensim.models import KeyedVectors
model_path = '/Users/myusername/Downloads/GoogleNews-vectors-negative300.bin.gz'
model = KeyedVectors.load_word2vec_format(model_path, binary=True)
This seems to have worked, but the assignment requires me to 'First, use gensim.downloader.load to load the word2vec-google-news-300 pretrained embedding model', so even though this workaround works I still need to figure out how to instead use the load method