In JSF 1.2 app, when users copy/paste something from a Word doc (Windows-1252 encoded), into an input area in our JSF app (UTF-8 encoded by default) while browser encoding is whatever encoding the end-user is using (defaults to what OS they are using(often windows-1252 or ISO), all of the single and double quote characters are stripped from the value stored in the backing bean immediately.
What I have tried to combat this effect:
In the server.xml, in the <connector>
tag I placed URIEncoding="UTF-8"
.
In the <meta>
tag I state charset="UTF-8"
.
In the setter for the backing bean I write convert the string to UTF-8 encoded byte[]
, then convert back to String
. The StringEscapeUtils.unescapeHtml
is supposed to take care of converting HTML entity numbers into proper format, and the replaces handle other representations of ISO and UTF encoding.
public void setInputNoteText(String inputNoteText) throws UnsupportedEncodingException {
byte[] utf8Bytes = inputNoteText.getBytes("UTF8");
String backToStr = new String(utf8Bytes, "UTF8");
String inputNoteUtf8 = backToStr.replace("?" , "•").replace("ï§", "•").replace("ï§" , "•").replace("ÂX" , "•").replace("" , "•").replace("âÂÂ" , "\'").replace("¡¥" , "\'").replace("¡¦" , "\'").replace("¡§" , "\"").replace("¡¨" , "\"").replace("Â", "\'").replace("â" , "\'");
inputNoteUtf8 = StringEscapeUtils.unescapeHtml(backToStr);
this.inputNoteText = inputNoteUtf8;
}
Can I force the user's browser into UTF-8 some way other than what I've tried so far? Or can anyone spot what I need to adjust? I'm running out of ideas.