On the streaming service Kanopy, I have noticed that some descriptions have strange looking text.
Screenshot from Kanopy showing mojibake
Screenshot shows:
BogieaEU(tm)s
and
HepburnaEU(tm)s
With a little effort, my theory is this:
The text started as
Bogie’s
Using ’ (U+2019) as the apostrophe.
This was saved as UTF-8 as the bytes [0xe2 0x80 0x99]
That sequence of bytes was then treated as Windows-1252 or some other encoding, so that the output was ’
So far, this is standard mojibake and I am not asking about that.
Some process then converted ’ into aEU(tm).
That is:
â -> a
€ -> EU
™ -> (tm)
Looks like some kind of transliteration, converting Unicode into an ASCII approximation.
I am wondering about the precise piece of software that is doing this conversion. I can't find it!
For example, iconv is a very popular library for doing transliteration.
But you can see on the main page here: https://www.php.net/manual/en/function.iconv.php
that iconv converts € into EUR, not EU.
So iconv is not doing this transliteration.
unidecode is another popular library.
But unidecode also converts € into EUR
>>> unidecode('\u00e2\u20ac\u2122')
'aEUR(tm)'
>>>
Is it possible to find the precise piece of software that transliterates € -> EU and ™ -> (tm) ?