Cluster user ratings with custom distance function using pyclustering

15 Views Asked by At
df= df.groupby("user", sort=False).apply(lambda x: list(x["rating"])).reset_index(name="rating")

numarr = userMovieRatingsDF["rating"].to_numpy()

def custom_distance(point1, point2):
    return np.sum(np.abs(point1 - point2))

metric = distance_metric(type_metric.USER_DEFINED, func=custom_distance)

initial_centers = kmeans_plusplus_initializer(numarr, 2).initialize()

kmeans_instance = kmeans(numarr, initial_centers, metric=metric)

kmeans_instance.process()
clusters = kmeans_instance.get_clusters()

Dataframe's rating column is a list of np.int64 items. Every list has the same amount of items. This is the error that I get when the initial_centers = kmeans_plusplus_initializer(numarr, 2).initialize() runs:

ValueError: operands could not be broadcast together with shapes (397,) (22362,)
0

There are 0 best solutions below