I have a program that reads from a file and then searches for a set of unique String lines and then splits them into disjoint groups.
The error I get when I read a large file that is above or equal to 1 GB.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:582)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:179)
at org.example.Main.main(Main.java:149).
I know I can change the heap size in the settings, but I want to solve this issue programmatically.
public class HugeFileReader {
public static void main(String[] args) throws IOException {
String OutputFile = "File created";
StringBuilder stringBuilder = new StringBuilder();
LineIterator bufferedReader = FileUtils.lineIterator(new File(args[0]),"UTF-8");
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(OutputFile));
List<Set<String>> numberOfGroups = new ArrayList<>();
List<Map<String, Integer>> positionOfNumbers = new ArrayList<>();
String line = bufferedReader.nextLine();
while (bufferedReader.hasNext()) {
String[] columns = getColumns(line);
Integer numOfGroup = null;
for (int i = 0; i < Math.min(positionOfNumbers.size(), columns.length); i++) {
Integer numOfGroup2 = positionOfNumbers.get(i).get(columns[i]);
if (numOfGroup2 != null) {
if (numOfGroup == null) {
numOfGroup = numOfGroup2;
} else if (!numOfGroup.equals(numOfGroup2)) {
for (String numbersOfGroup : numberOfGroups.get(numOfGroup2)) {
numberOfGroups.get(numOfGroup).add(numbersOfGroup);
for (int ii = 0; ii < getColumns(numbersOfGroup).length; ii++) {
if (getColumns(numbersOfGroup)[ii].isEmpty()) {
continue;
}
if (ii < positionOfNumbers.size()) {
positionOfNumbers.get(ii).put(getColumns(numbersOfGroup)[ii], numOfGroup);
} else {
HashMap<String, Integer> map = new HashMap<>();
map.put(getColumns(numbersOfGroup)[ii], numOfGroup);
positionOfNumbers.add(map);
}
}
}
numberOfGroups.set(numOfGroup2, new HashSet<>());
}
}
}
if (numOfGroup == null) {
if (Arrays.stream(columns).anyMatch(s -> !s.isEmpty())) {
numberOfGroups.add(new HashSet<>(List.of(line)));
for (int ii = 0; ii < columns.length; ii++) {
if (columns[ii].isEmpty()) {
continue;
}
if (ii < positionOfNumbers.size()) {
positionOfNumbers.get(ii).put(columns[ii], numberOfGroups.size() - 1);
} else {
HashMap<String, Integer> map = new HashMap<>();
map.put(columns[ii], numberOfGroups.size() - 1);
positionOfNumbers.add(map);
}
}
}
} else {
numberOfGroups.get(numOfGroup).add(line);
for (int ii = 0; ii < columns.length; ii++) {
if (columns[ii].isEmpty()) {
continue;
}
if (ii < positionOfNumbers.size()) {
positionOfNumbers.get(ii).put(columns[ii], numOfGroup);
} else {
HashMap<String, Integer> map = new HashMap<>();
map.put(columns[ii], numOfGroup);
positionOfNumbers.add(map);
}
}
}
line = bufferedReader.nextLine();
}
stringBuilder.append("group that contains the highest amount of elements ").append(numberOfGroups.stream().filter(s -> s.size() > 1).count());
numberOfGroups.sort(Comparator.comparingInt(s -> -s.size()));
int iterationOfGroups = 0;
for (Set<String> perGroup : numberOfGroups) {
iterationOfGroups++;
stringBuilder.append("\n").append("Группа ").append(iterationOfGroups).append("\n");
for (String setsOfNumbers : perGroup) {
stringBuilder.append(setsOfNumbers).append("\n");
}
}
bufferedWriter.write(stringBuilder.toString());
bufferedWriter.close();
bufferedReader.close();
}
private static String[] getColumns(String line) {
for (int i = 1; i < line.length() - 1; i++) {
if (line.charAt(i - 1) != ';' && line.charAt(i + 1) != ';' && line.charAt(i) == '"') {
return new String[0];
}
}
return line.replaceAll("\"", "").split(";");
}
}
Your OutOfMemoryException happens when you write to your StringBuilder. So you are able to read the whole file, but you buffer your complete output in RAM in the StringBuilder before writing it into a file. This is not necessary. You can directly write your output into a file and reduce the memory usage by this: