I am trying to compare 2 strings which are basically addresses.
I was trying to use jaro_distance
iex(1)> String.jaro_distance("4420 West Main Street", "EUTECTIC CORPORATION QA testing1")
0.49107142857142855
but there is no similarity between the 2 strings.
I also tried PSQL SIMILAR TO as well. in this way
def find_match(seeker_company_id, string, type) do
search = "%(" <> string <> ")%"
base_query =
from op in OpenCorporates,
where: op.seeker_company_id == ^seeker_company_id
base_query
|> type_query(type, search)
|> Repo.aggregate(:count)
end
defp type_query(query, :name, value) do
from op in query,
where: fragment("? SIMILAR TO ?", op.name, ^value)
end
defp type_query(query, :address, value) do
from op in query,
where: fragment("? SIMILAR TO ?", op.registered_address, ^value)
end
but if the search string and actual strings are like this
Search string: '29 SANTA CRUZ COURT PITTSBURG CA 662354553' and actual address string: '29 SANTA CRUZ COURT PITTSBURG CA 94565'
it fails as well. but it should not fail here because most of the string matches.
so what could be a solution here to this, is there a way to calculate the percentage of the match? in the above case, we can say that it is 80% match.
any guidance will be helpful thank you.
You may want to see about what you get with Levenshtein distance calculation or Hamming. I would also point out that the way a Jaro distance is calculated (at least according to Wikipedia) "The score is normalized such that 0 means an exact match and 1 means there is no similarity"--well a score of .49 does seem to indicate a significant difference.