We use docx4j (docx4j-JAXB-MOXy v11.4.9) to add images to a docx then pass it through another system (only-office) to edit it. If I load the docx, clone it, clear the body, copy some of the original content back in then save it, one of the namespaces (now needed for images) is removed. So when you open in word of course you get the "Unreadable content" message.
The saved document is missing this namespace from the original at the top of document.xml :
xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
Excerts from document.xml showing how the wpg namespace is used to add a fallback image:
<w:r>
<w:rPr>
<w:color w:val="000000"/>
</w:rPr>
<w:t xml:space="preserve">Image :</w:t>
<mc:AlternateContent>
<mc:Choice Requires="wpg">
<w:drawing>
...
</w:drawing>
</mc:Choice>
<mc:Fallback>
<w:pict>
...
</w:pict>
</mc:Fallback>
</mc:AlternateContent>
</w:r>
Example code to reproduce from an original containing that namespace:
InputStream mergedDocStream = new BufferedInputStream(new FileInputStream("c:\\tmp\\original.docx"), 8 * 1024 * 1024);
WordprocessingMLPackage mergedDoc = WordprocessingMLPackage.load(mergedDocStream);
mergedDocStream.close();
List<Object> mergedDocContentList = mergedDoc.getMainDocumentPart().getContent();
// get the body section (contains styles, orientation, headers/footers) which needs adding to each doc
SectPr finalSectionProperties = mergedDoc.getMainDocumentPart().getJaxbElement().getBody().getSectPr();
// Create a blank target using the merged file
ObjectFactory objectFactory = new ObjectFactory();
mergedDoc.getMainDocumentPart().getJaxbElement().setBody(objectFactory.createBody());
WordprocessingMLPackage outputDoc = (WordprocessingMLPackage) mergedDoc.clone();
// loop through the original doc's contents and add
for (Object content : mergedDocContentList) {
outputDoc.getMainDocumentPart().addObject(content);
}
// add body section for original styles etc
outputDoc.getMainDocumentPart().getJaxbElement().getBody().setSectPr(finalSectionProperties);
// save last doc
File outputFile = new File("c:\\tmp\\output.docx");
outputDoc.save(outputFile);
Is there a way we can convince it to keep that namespace?
By way of background, JAXB automatically declares required namespaces.
The ones specified in @Requires are different, since although Word needs them, they aren't required in an XML spec sense.
So docx4j keeps track of these as they are encountered during unmarshalling (see Docx4jUnmarshallerListener). But if the @Requires attribute isn't present in the docx at that time, this can't be done.
JaxbXmlPart contains:
In your case,
should do the trick.