I'm trying to print something to an HTML web page using a servlet code. I use UTF-8 encoding but Turkish characters are not shown adequately on the web page.
How I define UTF-8 encoding:
String htmlStart = "<html>\n" +
"<head>
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />
<title>" + title + "</title>
</head>\n" +
"<body bgcolor = \"#f0f0f0\">\n" +
"<h1 align = \"center\">" + title + "</h1>\n" +
"<ul>\n" + " <li><b>"+url + "</b>" + "</ul>\n";
How I print words in html:
for (String token : parsed) {
med+= "<p>" + token + "</p>\n";
System.out.println(token);
}
What is written to the Eclipse console by the above code:
Muğla Sıtkı Koçman Üniversitesi
What I see at the generated HTML:
Mu?la S?tk? Koçman Üniversitesi
You are already specifying
"charset=utf-8"for the generated HTML, so reading/rendering the data shouldn't be a problem in the browser (as you suggest).But your console sample code is incorrect because it does not specify that UTF-8 is to be used. The default behavior will be to use the default encoding of your platform when creating the data, which is probably not what you want.
The simplest way to fix that in your sample code is to reassign
System.outto aPrintStreamthat uses UTF-8 by callingsetOut():However, if I run that code from the Windows Command Prompt I get this mess:
The first line fails (like yours) because the data is being written and read using the default encoding, which is Cp437 on my machine. And the second line fails because although the data is being correctly written as UTF-8, it is still being rendered using Cp437.
To fix that, explicitly set your console's code page to UTF-8 by specifying
chcp 65001in the console before running your code (on Windows at least). Then you will see that the second line renders correctly, because it is both written and read as UTF-8:Notes: