python: Optimise groups of similar products for similarity in groups and for similarity over groups collectively?

65 Views Asked by At

I have a dataframe/table of product data that contains all possible pairings of products in the format:

  • origin product
  • destination product
  • similarity of the two products (Jaccard index)

Using this similarity data, I would like to use Python to group each product into groups of similar products. I would like to optimise these groups using some optimisation algorithm (or similar device) to maximise the mean similarity of each group and of the universe of groups as a whole. I likely also need to account for group size in some way as I don't want a small number of huge groups or lots of really tiny groups.

For example

Origin destination Similarity
prod 1 prod 2 1
prod 3 prod 2 6
prod 1 prod 3 1
prod 1 prod 4 2
prod 3 prod 4 1
prod 2 prod 4 1
prod 1 prod 5 4
prod 3 prod 5 0
prod 2 prod 5 1
prod 5 prod 4 3

Groups:

Group A = mean similarity of 3 products 1, 4, and 5

Group B = mean similarity of 6 products 2 and 3

Universe = mean similarity of 4.5

I've attempted quite a few methods already but all very manual and not particularly repeatable or scalable.

Would anyone be able to suggest any optimisation algorithms for this problem with examples/documentation for how to implement them in Python?

I don't have any experience in optimisation algorithms or statistical testing in this sense either, so if anyone has any suggestions for better metrics than mean similarity etc., then please do let me know. Looking to learn as much as I can :)

Thank you!

Previous attempts:

I tried iterating over the unique products and creating subsets of the dataframe where that product is the origin product.

I then applied the current group of each destination product to the dataframe and grouped up by current group value.

I then assigned each unique product to the group that had the highest mean similarity value or if it didn't meet a certain threshold, I allowed it to create a new group.

However, this didn't really optimise the groups once they were assigned, even when i iterated through it multiple times.

I think the limitation of it was that it only looked at one product at a time and the only optimisation was for that product into a new group. It didn't look at how that move affected the current group or how it affected the strength of the universe of groups overall.

0

There are 0 best solutions below