How can I get the similarity matrix from minhash LSH?

736 Views Asked by At

I have read many tutorials and tried a number of minhash LSH, but it cannot generate the similarity matrix, instead it returns just similar data which exceeds the threshold. How can I generate it? My intention is to use the LSH results for clustering.

1

There are 1 best solutions below

4
Has QUIT--Anony-Mousse On BEST ANSWER

The whole point of LSH is to avoid pairwise distances, because that does not scale.

If you then put the data into a distance matrix, you get all the scalability problems again!

Instead consider an algorithm like DBSCAN clustering. It doesn't need a distance matrix, only neighbors at distance epsilon.