I'm using iconv's transliterate function to convert a unicode string to the nearest ASCII equivalent. However, the string contains some symbols which do not have an ASCII equivalent. I want to retain such symbols without dropping them.
Currently, here's what I am doing:
iconv_t cd = iconv_open("ASCII//IGNORE//TRANSLIT", "UTF-8");
const char *utf8 = "ç ß ∑ a";
char* in = const_cast<char*>(utf8);
size_t in_bytes = strlen(in);
char buf[BUFSIZ] = {};
char* out = buf;
size_t out_bytes = sizeof(buf);
iconv(cd, &in, &in_bytes, &out, &out_bytes);
printf("%s", buf);
// prints
c ss a
How do I configure iconv to produce an output like the following:
c ss ∑
If this is not possible with iconv, is there a way to achieve this programatically otherwise?
iconvdoes not support the conversion behaviour that you want to see out-of-the-box, because it is a quite odd behaviour: If it's OK to have a ∑ in the output, why would it not have OK to have a ß in the output?Anyway, you can implement this conversion through a function of your own, that uses iconv, as follows:
cd1. When the call fails with errno == EILSEQ, you know that it's because of a character that cannot be transliterated to ASCII.cd0, to convert one and only one character. You do this by calling iconv() with in = 1, then if that fails with in = 2, and so on up to in = 4. (If all of these fail, you must have invalid input; your best bet is to skip one input byte and leave a single '?' in output.)