I am having issues with allowing all English/Latin based characters (including accents), but disallowing Chinese/Russian characters.
The first version I had was as follows:
strlen($values['person_name']) != mb_strlen($values['person_name'], 'utf-8')
This one worked fine initially, but when Icelandic/Czech names came into play, this did not work anymore.
The second version I had was as follows:
preg_match("~^[a-zÀ-ÿ][\'a-zÀ-ÿ \-]*$~i", $values['person_name'])
This seemed to work fine for majority of cases, but it is giving an error on a test name
Eliška Koňaříková
I have tried the following as well without any luck:
preg_match("/[^\w ]/u", $values['person_name']) //does not allow š
preg_match("/\PL/u", $values['person_name']) //does not allow š
preg_match("/^[a-zA-Z\s,.'\-\pL]+$/u", $values['person_name']) //allows š, but also allows 書
preg_match("/^[\s,.'-]*\p{L}[\p{L}\s,.'-]*$/u", $values['person_name']) //allows š, but also allows 書
preg_match("/[^a-zA-Z0-9àâáçéèèêëìîíïôòóùûüÂÊÎÔúÛÄËÏÖÜÀÆæÇÉÈŒœÙñý,. ]/u", $values['person_name']) //allows š, but also allows 書
preg_match("~^[a-zÀ-ÿ][\'a-zÀ-ÿ \-]*$~iu", $values['person_name']) //does not allow š
preg_match("/^[\p{L}-]*$/u", $values['person_name']) //allows š, but also allows 書
preg_match("/([\w ]{2,})/u", $values['person_name']) //allows š, but also allows 書
preg_match('/[^\p{Latin}0-9€, !"§$%&\/()=#|<>]/u', $values['person_name']) //allows š, but also allows 書
All of the above either failed with the name provided, or it allowed Chinese characters.
I believe the best route for me would be to revert back to the check that was working for most characters (except with the Czech names that are giving an error):
preg_match("~^[a-zÀ-ÿ][\'a-zÀ-ÿ \-]*$~i", $values['person_name'])
And manually add the Czech characters that are not accepted such as š, ň, ř, etc.
Is there a cleaner solution than manually having to specify each of these characters?
maybe it's better to replace the chars, this is only an example of doing that and it's not a complete function: