An invalid XML character (Unicode: 0x3) was found on unmarshalling after successful marshalling

5.2k Views Asked by At

I fully understand the error "An invalid XML character (Unicode: 0x3) was found"

Caused by: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x3) was found in the element content of the document. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472) ~[na:1.8.0_111] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2923) ~[na:1.8.0_111]

But I cannot believe my eyes that it is marshalled with this character in the first place.

I've marshalled the class that contained portions of .gz file in it, and the marshalling was successful. When I tried to unmarshal it, it gave me this error.

The marshaller and unmarshaller I used were from /com/sun/xml/internal/bind/v2/runtime/ -- rt.jar.

Marshaller marshaller = context.createMarshaller();
marshaller.marshal(object, stringWriter);
Unmarshaller unmarshaller = context.createUnmarshaller();
unmarshaller.unmarshal(new StringReader(stringWriter.toString()));

This is obvious reflexivity issue and I don't know how to deal with it.

Anyone who had the same issue, please advise how to overcome it, hopefully, without marshaller change.

P.S. From my understanding, marshallers should always be reflexive and do not marshal things that it cannot unmarshal. It's a shame that rt.jar one is not.

3

There are 3 best solutions below

0
On

Third thing I forgot about it...

There are characters invalid to be in the XML as string and must be escaped as:

<   &lt;
>   &gt;
&   &amp;
 for attribute values only:
"   &quot;
'   &apos;

If any of your string can have them they must be either escaped or included in CDATA if they are not in attributes.

see here: Invalid Characters in XML

1
On

Why don't you try removing the invalid charaters.

Discussion on this was done in this thread.

check this thread

Hope this helps!!

4
On

Why do it with marshallin/unmarshaling technique? You have a Java object at first. How did you get it? and why it has invalid for XML character, but good for Java? Based on requirement you have three options:

  1. If data in Java object is correct and must be passed inside XML you have to encode them with Base64. Binary data cannot be presented in the XML.

  2. If it is bad data and you have to handle it as error - do it before marshalling

  3. If you do not need that invalid bytes - remove them as suggested.

From other hand: check your marshaller default encoding. When you create a marshaller there is a property "jaxb.encoding". Does it match what unmarshaller uses? i.e. for "utf-8"

marshaller.setProperty("jaxb.encoding","utf-8")