Use case Huge CSV file ~500MB that needs to be read very fast and to not load the meemory with.
Ideea that I used.
Read the csv line by line and save the transformed data directly into the database. ( At a later stage I am getting the data from the database and send it to another service but is not relevant for now)
public void importData() {
try (
Reader reader = reader.readData();
BufferedReader bufferedReader = new BufferedReader(reader);
) {
String line;
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
bufferedReader.readLine();
while ((line = bufferedReader.readLine()) != null) {
String[] parts = line.split(",");
LocalDate date= !parts[2].isEmpty() ? LocalDate.parse(parts[2], formatter) : null;
String partThree = parts[3];
String partZero= parts[0];
String partOne= parts[1];
String partFour= parts[4];
String partFive= parts.length >= 6 ? parts[5] : null;
service.saveDog(DogEntry.builder()
.breed(partZero)
.originSystem(partOne)
.date(date)
.state(partThree )
.center(partFour)
.partFive(partFive)
.build());
}
} catch (IOException e) {
throw new DOGException(ErrorCodes.CODES, "Cannot read Dog data", e);
}
}
Service method
public void saveDog(DogEntry entry) {
LOGGER.info("Receiving Dog {}",
entry.getBreed());
final Dog dog = updateOrCreateDog(entry);
dogRepository.save(dog);
}
private Dog updateOrCreateDog(final DogEntry entry) {
Optional<Dog> existingDog = dogRepository.findByBreedAndOrigin(entry.getBreed(), entry.getOrigin());
return existingDog.map(dog -> getUpdatedDog(dog, entry)).orElseGet(() -> createNewDog(entry));
}
private Dog getUpdatedDog(Dog existingDog, DogEntry entry) {
existingDog.setBreed(entry.getBreed());
existingDog.setOrigin(entry.getOriginSystem());
existingDog.setStatus(entry.getState());
existingDog.setCenter(entry.getCenter());
return existingDog;
}
private Dog createNewDog(final DogEntry entry) {
return Dog.builder()
.breed(entry.getBreed())
.origin(entry.getOriginSystem())
.status(entry.getState())
.center(entry.getCenter())
.build();
}
The problem is that I can not get all the info from csv and store it in a list because will cause OOM.
Is there a faster way so I am not getting a timeout trying to read and process the csv file ?
You can try to use the
BufferedReader.lines()method. It looks something like the code below. This method reads file line-by-line which make it memory efficient.