How to use Word2Vec to average the vectors of different model words from the segmented files

47 Views Asked by Emily At 17 November 2023 at 09:04

1.I want to write a Python program that needs to preprocess the segmented texts (there are a total of 129 txt files in jieba2) before processing.

2.Create individual word embeddings for each word in the text (average over all 129 models).

3.Save the averaged models, which have been trained with individual word embeddings, into the word2vec2 folder. Currently, the code I have doesn't perform individual word embeddings averaging; it only averages the overall models.

The following is my program code. Where is the error?

from gensim.models import Word2Vec
import os
import numpy as np
import pickle

input_folder = "jieba2"
output_folder = "word2vec2"

if not os.path.exists(output_folder):
    os.makedirs(output_folder)
for txt_filename in os.listdir(input_folder):
    if txt_filename.endswith(".txt"):
        txt_path = os.path.join(input_folder, txt_filename)
 with open(txt_path, "r", encoding="utf-8") as txt_file:
            lines = txt_file.readlines()
            sentences = [line.split() for line in lines]  
  model = Word2Vec(vector_size=100, window=5, min_count=1, workers=4)
  model.build_vocab(sentences)
   if not model.wv.key_to_index:      
            continue
   model.train(sentences, total_examples=model.corpus_count, epochs=model.epochs)

   model_vectors = model.wv.vectors
        model_average_vector = np.mean(model_vectors, axis=0)

   model_average_vectors[txt_filename] = model_average_vector

model_filename = txt_filename.replace(".txt", ".model")
        model_path = os.path.join(output_folder, model_filename)
        model.save(model_path)

average_vectors_output_path = os.path.join(output_folder, "average_vectors.pkl")
with open(average_vectors_output_path, "wb") as output_file:
    pickle.dump(model_average_vectors, output_file)

Thank you

Please help me find the code error or fix it Thank you

Original Q&A

How to use Word2Vec to average the vectors of different model words from the segmented files

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in WORD2VEC

Trending Questions

Popular # Hahtags

Popular Questions