closest string match in Elixir or Ecto

56 Views Asked by At

I am trying to compare 2 strings which are basically addresses.

I was trying to use jaro_distance

iex(1)> String.jaro_distance("4420 West Main Street", "EUTECTIC CORPORATION QA testing1")
0.49107142857142855

but there is no similarity between the 2 strings.

I also tried PSQL SIMILAR TO as well. in this way

  def find_match(seeker_company_id, string, type) do
    search = "%(" <> string <> ")%"
    base_query =
      from op in OpenCorporates,
        where: op.seeker_company_id == ^seeker_company_id

    base_query
    |> type_query(type, search)
    |> Repo.aggregate(:count)
  end

  defp type_query(query, :name, value) do
    from op in query,
      where: fragment("? SIMILAR TO ?", op.name, ^value)
  end

  defp type_query(query, :address, value) do
    from op in query,
      where: fragment("? SIMILAR TO ?", op.registered_address, ^value)
  end

but if the search string and actual strings are like this

Search string: '29 SANTA CRUZ COURT PITTSBURG CA 662354553' and actual address string: '29 SANTA CRUZ COURT PITTSBURG CA 94565'

it fails as well. but it should not fail here because most of the string matches.

so what could be a solution here to this, is there a way to calculate the percentage of the match? in the above case, we can say that it is 80% match.

any guidance will be helpful thank you.

1

There are 1 best solutions below

0
Onorio Catenacci On

You may want to see about what you get with Levenshtein distance calculation or Hamming. I would also point out that the way a Jaro distance is calculated (at least according to Wikipedia) "The score is normalized such that 0 means an exact match and 1 means there is no similarity"--well a score of .49 does seem to indicate a significant difference.