I've seen a ton of questions on these 2 algorithms but I can't make my mind around what I should use in my use case.
I need to compare 2 strings representing addresses and I need to know if 2 strings 'represent' the same address.
For example:
23 Real Road, CR0 3RL, Camden, London, United Kingdom
23 Real Road, CR0 3RL, London, United Kingdom
What will be the best algorithm to estimate the 'matching score' of these 2 strings?
I discarded Levenshtein because it counts characters and for example the following 2 strings would result quite different:
23 Real Rd, CR0 3RL, London, UK
23 Real Road, CR0 3RL, Camden, London, United Kingdom
while with Jaccard and Cosine they'd look more similar. But which one of these 2 would be more appropriate for this use case?
In practice, I think my question could also be formulated like: what algorithm is used in services like amazon.com to make addresses recommendation based on the address text the user entered? (I guess it's a similar problem).