I am a beginner in Machine Learning and trying Document Embedding for a university project. I work with Google Colab and Jupyter Notebook (via Anaconda). The problem is that my code is perfectly running in Google Colab but if i execute the same code in Jupyter Notebook (via Anaconda) I run into an error with the ConcatenatedDoc2Vec Object.
With this function I build the vector features for a Classifier (e.g. Logistic Regression).
def build_vectors(model, length, vector_size):
vector = np.zeros((length, vector_size))
for i in range(0, length):
prefix = 'tag' + '_' + str(i)
vector[i] = model.docvecs[prefix]
return vector
I concatenate two Doc2Vec Models (d2v_dm, d2v_dbow), both are working perfectly trough the whole code and have no problems with the function build_vectors():
d2v_combined = ConcatenatedDoc2Vec([d2v_dm, d2v_dbow])
But if I run the function build_vectors() with the concatenated model:
#Compute combined Vector size
d2v_combined_vector_size = d2v_dm.vector_size + d2v_dbow.vector_size
d2v_combined_vec= build_vectors(d2v_combined, len(X_tagged), d2v_combined_vector_size)
I receive this error (but only if I run this in Jupyter Notebook (via Anaconda) -> no problem with this code in the Notebook in Google Colab):
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [20], in <cell line: 4>()
1 #Compute combined Vector size
2 d2v_combined_vector_size = d2v_dm.vector_size + d2v_dbow.vector_size
----> 4 d2v_combined_vec= build_vectors(d2v_combined, len(X_tagged), d2v_combined_vector_size)
Input In [11], in build_vectors(model, length, vector_size)
3 for i in range(0, length):
4 prefix = 'tag' + '_' + str(i)
----> 5 vector[i] = model.docvecs[prefix]
6 return vector
AttributeError: 'ConcatenatedDoc2Vec' object has no attribute 'docvecs'
Since this is mysterious (for me) -> Working in Google Colab but not Anaconda and Juypter Notebook -> and I did not find anything to solve my problem in the web.
If it's working one place, but not the other, you're probably using different versions of the relevant libraries – in this case,
gensim.Does the following show exactly the same version in both places?
If not, the most immediate workaround would be to make the place where it doesn't work match the place that it does, by force-installing the same explicit version –
pip intall gensim==VERSION(whereVERSIONis the target version) – then ensuring your notebook is restarted to see the change.Beware, though, that unless starting from a fresh environment, this could introduce other library-version mismatches!
Other things to note:
docvecsis now called justdv- so some older code erroring this way may only needdocvecsreplaced withdvto work. (Other tips for migrating older code to the latest Gensim conventions are available at: https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4 )ConcatenatedDoc2Vecclass from. A clas of that name exists in some Gensim demo/test code, as a very minimal shim class that was at one time used in attempts to reproduce the results of the original "Paragaph Vector" (akaDoc2Vec) paper. But beware: that's not a usual way to useDoc2Vec, & the class of that name I know barely does anything outside its original narrow purpose.Doc2Vecjust pick one mode.Doc2Veccode from that one experiment.