Large Scale Hierarchical Agglomerative Clustering With Custom Distance Function/Similarity Matrix

52 Views Asked by At

I’m working on a project where I need to run hierarchical agglomerative clustering on between 1 million and 10 million data points. I also need to use a custom distance function (I cannot use euclidean space) because of the nature of the data I'm using. Does anyone know of any efficient and/or distributed implementations that I might be able to use?

So far I've been using the SkLearn implementation of Hierarchical Agglomerative Clustering, but am running into runtime issues when increasing the size of my dataset even close to where it needs to be.

Any advice is welcome! Thank you!

0

There are 0 best solutions below