What is the easiest way to normalize a text field in postgresql table?
I am trying to find duplicates. For example, I want to consider O'Reilly a duplicate of oreilly. La Salle should be a duplicate of la'salle as well.
In a nutshell, we want to
- lowercase all text,
- strip accents
- strip punctuation marks such as these
[.'-_]and - strip spaces
Can this all be done in one or two simple steps? Ideally using built in postgresql functions.
Cheers
The following will give you what you want, using just standard Postgres functions;
See example here. Or if you do not want digits the just