I am new to Java and trying to do content format changes.
This is for email bodies and not attachments.
I have the following code for the part when I identify the body is RTF, and we need to convert it to HTML:
Case "R":
ContentHandler handler = new ToHTMLContentHandler();
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
try {
InputStream stream = new ByteArrayInputStream(TEXT.getBytes(StandardCharsets.UTF_8));
parser.parse(stream, handler, metadata);
System.out.println("temp = " + handler.toString());
TEXT = handler.toString();
} catch (Exception f) {
f.printStackTrace();
RCOut[0] = "42";
}
tp.setContent(TEXT, "UTF-8");
message.setHeader("Content-Type", "text/html; charset=UTF-8");
tp.setHeader("Content-Type", "text/html; charset=UTF-8");
message.setContent(TEXT, "text/html; charset=UTF-8");
break;
I get an HTML document, but it does not seem to actually be using any of the RTF tags.
Using the string that looks like:
{\\rtf1\\ansi\\deff0 {\\fonttbl {\\f0 Courier;}}\r\n {\\colortbl;\\red0\\green0\\blue0;\\red255\\green0\\blue0;}\r\n This line is the default color\\line\r\n \\cf2\r\n \\tab This line is red and has a tab before it\\line\r\n \\cf1\r\n \\page This line is the default color and the first line on page 2\r\n}
I would have expected something like --
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.DefaultParser">
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.microsoft.rtf.RTFParser">
<meta name="Content-Type" content="application/rtf">
<title></title>
</head>
<body><p>This line is the default color </p>
<p style="color:red;">This line is red and has a tab before it </p>
<p>This line is the default color and the first line on page 2</p>
</body></html>
What I get is --
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.DefaultParser">
<meta name="X-TIKA:Parsed-By" content="org.apache.tika.parser.microsoft.rtf.RTFParser">
<meta name="Content-Type" content="application/rtf">
<title></title>
</head>
<body><p>This line is the default color
This line is red and has a tab before it
This line is the default color and the first line on page 2</p>
</body></html>
Any help would be appreciated.
Thanks!