Jaccard vs Cosine similarity for addresses string comparison

30 Views Asked by At

I've seen a ton of questions on these 2 algorithms but I can't make my mind around what I should use in my use case.

I need to compare 2 strings representing addresses and I need to know if 2 strings 'represent' the same address.

For example:

23 Real Road, CR0 3RL, Camden, London, United Kingdom 

23 Real Road, CR0 3RL, London, United Kingdom

What will be the best algorithm to estimate the 'matching score' of these 2 strings?

I discarded Levenshtein because it counts characters and for example the following 2 strings would result quite different:

23 Real Rd, CR0 3RL, London, UK

23 Real Road, CR0 3RL, Camden, London, United Kingdom

while with Jaccard and Cosine they'd look more similar. But which one of these 2 would be more appropriate for this use case?

In practice, I think my question could also be formulated like: what algorithm is used in services like amazon.com to make addresses recommendation based on the address text the user entered? (I guess it's a similar problem).

0

There are 0 best solutions below