I am trying to train a BERTopic model with a seed topic list. However, the model returns a Value Error:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
I am working with Python 3.10.5 and Numpy 1.24.3.
The same error happens when running the official tutorial example, so I assume there is an issue with changes in libraries.
The example below:
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))["data"]
seed_topic_list = [["drug", "cancer", "drugs", "doctor"],
["windows", "drive", "dos", "file"],
["space", "launch", "orbit", "lunar"]]
topic_model = BERTopic(seed_topic_list=seed_topic_list, verbose=True, calculate_probabilities=False)
topics = topic_model.fit_transform(docs)
Thanks a lot for the ideas!
I have been having the exact same problem, it's the first time I'm using this library, so I don't know if this worked in previous versions.I'll have a look by downgrading to the version just before this one.
Found problem: It seems that there is a compatibility issue with the latest numpy package. I downgraded to 1.21.0 and the tutorial example works fine.
If this works for you select this as a solution so the thread can be closed.