How can I optimize GC for String while reading an xlsx file with Apache's xssfWorkbook?

49 Views Asked by At

The program is reading (*.xlsx) files using XSSFWorkbook in poi-5.2.3 version.

Sometimes, an OutOfMemory (OOM) Exception occurs when calling getStringCellValue() for each Cell.

I adjusted heap size to 2GB(both -xms and -xmx), but to no avail. Why does simply calling getStringCellValue() lead to an OOM?

Here's the simplified code example where the OOM arises:

Copy code
    public void read(String filename) throws IOException {
        IOUtils.setByteArrayMaxOverride(Integer.MAX_VALUE);
        FileInputStream target = new FileInputStream(filename);
        XSSFWorkbook workbook = new XSSFWorkbook(target);
        XSSFSheet sheet = workbook.getSheetAt(0);
        for (Row row : sheet) {
            for (Cell cell : row) {
                cell.getStringCellValue();
            }
        }
        workbook.close();
    }

This code has been adjusted to depict the situation; merely invoking getStringCellValue(), even without variable assignment.

This issue doesn't arise with all files but specifically with large ones (in my case, exceeding just 3MB). I suspect that the unique, immutable String value of each cell accumulates in the heap and isn't effectively cleared by the Garbage Collector (GC).

I guess, it appears to be linked to the maximum number of String values stored in the heap rather than a heap capacity problem. This is evident from encountering an exception even with an Excel file of approximately 3MB. In my case, if the capacity exceeds the minimum required heap size (approximately 256MB or more) The section where the exception occurred was exactly the same.

When logging every 100 rows for testing, the process begins by reading the first 100,000 rows, after which a sleep event occurs, likely due to garbage collection activity. This cycle of reading and sleeping repeats, but as it progresses, the number of rows read decreases sharply, eventually leading to an Out of Memory (OOM) error.

it's log of Exception

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:78)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:75)
    at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:102)
    at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:110)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1248)
    at org.apache.xmlbeans.impl.values.JavaStringEnumerationHolderEx.getEnumValue(JavaStringEnumerationHolderEx.java:60)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getT(CTCellImpl.java:466)
    at org.apache.poi.xssf.usermodel.XSSFCell.getBaseCellType(XSSFCell.java:686)
    at org.apache.poi.xssf.usermodel.XSSFCell.getCellType(XSSFCell.java:664)
    at org.apache.poi.xssf.usermodel.XSSFCell.getRichStringCellValue(XSSFCell.java:293)
    at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:280)
    at Main.read(Main.java:46)
    at Main.main(Main.java:15)
0

There are 0 best solutions below