I am trying to strip accents from a string using the boost local library.
The normalize function removes the entire character with the accent, i only want to remove the accent.
è -> e for example
Here is my code
std::string hello(u8"élève");
boost::locale::generator gen;
std::string str = boost::locale::normalize(hello,boost::locale::norm_nfd,gen(""));
Desired ouput : eleve
My Output : lve
Help please
That's not what normalize does. With
nfdit does "canonical decomposition". You need to THEN remove the combining character code points.UPDATE Adding a loose implementation gleaning from the utf8 tables that most combining character appear to lead with 0xcc or 0xcd:
Live On Wandbox
Prints (on my box!):
Older answer text/analysis:
Prints, on my box:
The docs say: https://www.boost.org/doc/libs/1_72_0/libs/locale/doc/html/conversions.html#conversions_normalization
What you could do
It seems that you MIGHT get some way by doing the Decomposition only (so NFD) and then removing any code-points that aren't alpha.
This is cheating, because it assumes all code-points are single-unit, which is not generically true, but for the sample it does work:See improved version above which does iterate over code-points instead of bytes.