Large Scale Hierarchical Agglomerative Clustering With Custom Distance Function/Similarity Matrix

52 Views Asked by Michelle_J At 26 July 2023 at 23:33

I’m working on a project where I need to run hierarchical agglomerative clustering on between 1 million and 10 million data points. I also need to use a custom distance function (I cannot use euclidean space) because of the nature of the data I'm using. Does anyone know of any efficient and/or distributed implementations that I might be able to use?

So far I've been using the SkLearn implementation of Hierarchical Agglomerative Clustering, but am running into runtime issues when increasing the size of my dataset even close to where it needs to be.

Any advice is welcome! Thank you!

Original Q&A

Large Scale Hierarchical Agglomerative Clustering With Custom Distance Function/Similarity Matrix

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in DISTRIBUTED-COMPUTING

Related Questions in HIERARCHICAL-CLUSTERING

Related Questions in RAY

Related Questions in UNSUPERVISED-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions