MySQL: Grouping results by perceptual hash similarity

1k Views Asked by At

Let's say we have MySQL table Image with following columns

  • id
  • user_id
  • p_hash

I know how to calculate hamming distance (to reveal similar images) between newly inserted row's perceptual hash and all existing data in table. SQL query looks like this:

SELECT `Image`.*, BIT_COUNT(`p_hash` ^ :hash) as `hamming_distance`
FROM `Image`
HAVING `hamming_distance` < 5

I want to do the same to every existing image.(to check if there are similar images in the database) So, I have go through every row of the Image table, do the same process as above and find similar images from the table.

Now the question is, after whole procedure I want to get similar image groups only if elements of each group has at least one different user_id id?

So if, found group of similar images belongs to one user, then skip it. But if it belongs to multiple different users then return it as one of results.

Please help to figure out.

1

There are 1 best solutions below

12
Barmar On

Sounds like you want a self-join.

SELECT i1.id, GROUP_CONCAT(i2.id) AS similar_images
FROM Image AS i1
JOIN Image AS i2 ON i1.user_id != i2.user_id AND BIT_COUNT(i1.`p_hash` ^ i2.p_hash) < 5
GROUP BY i1.id

DEMO