Normalize human names in postgresql

366 Views Asked by GNG At 07 August 2020 at 01:58

What is the easiest way to normalize a text field in postgresql table?

I am trying to find duplicates. For example, I want to consider O'Reilly a duplicate of oreilly. La Salle should be a duplicate of la'salle as well.

In a nutshell, we want to

Can this all be done in one or two simple steps? Ideally using built in postgresql functions.

Cheers

There are 1 best solutions below

Belayer On 08 August 2020 at 23:15

The following will give you what you want, using just standard Postgres functions;

regexp_replace (lower(unaccent(string_in)),'[^0-9a-z]','','g')

See example here. Or if you do not want digits the just

regexp_replace (lower(unaccent(string_in)),'[^a-z]','','g')