I'm trying to read .docx files with styling information using Apache Poi which I have done by looping through each XWPFParagraph and working with all the XWPFRun run inside the paragraphs. Now I want to get contents of each pages. So is there a way to get the contents of each pages or is it possible to know in which page a paragraph is currently in?
This is a function that takes the absolute path of a docx file and returns an array of strings
FileInputStream fis = new FileInputStream(absolutePath);
XWPFDocument document = new XWPFDocument(fis);
List<IBodyElement> bodyElements = document.getBodyElements();
List<String> textList = new ArrayList<>();
/* I want to add some kind of outer loop here for each page
and at the end of that loop I want to add a "<hr/>" tag in the textList
*/
for (IBodyElement bodyElement : bodyElements) { // Looping through paragraphs
if (bodyElement.getElementType() == BodyElementType.PARAGRAPH) {
XWPFParagraph paragraph = (XWPFParagraph) bodyElement;
String textToAdd = parseParagraph(paragraph); //custom funtion to handle paragraphs
textList.add(textToAdd);
}
}
document.close();
return textList.toArray(new String[0]);
As you can see my goal here is to add a <hr/> tag after each page. So, if somehow I can get the page number of a paragraph or loop through pages, I will be able to do that.
Please kindly mention if you know about any other approach that may help.
To get Page Count from
XWPFDocument(for your outer loop), you can do something like this:For your paragraph text,