Amazon's Mechanical Turk rejects CSV HIT files that contain 4-byte UTF-8 characters - such as Emoji. However, Emoji characters are an integral part of the worker tasks and I need to keep them.
I found the script at https://github.com/charman/mturk-emoji, which replaces the Emoji character with their equivalent HTML spans (e.g., ). However, when feeding the preprocessed CSV to MTurk, the Emoji characters are not rendered.
I managed to solve the problem following these steps.
Convert the CVS with UTF-8 Emojis using the script
encode_emoji.pyin the linked GitHub repo. You get, say,sample_with_emoji.csv.In Mechanical Turk, edit your current project and go to
Design Layout. In order for the HTMLspanwith the emoji bytes to be properly rendered, you need to add the following code at the beginning in the HTML Editor of MTurk:The above is basically the content at the bottom of the README file in the repo, with the script
decode_emoji.jsadded inline rather than sourced.sample_with_emoji.csvfile, the emojis are properly rendered in the Preview.