How do I read a csv file line by line only?

80 Views Asked by At

I need to read a csv-file, but the file contains broken lines. Here's an example:

"name","address","link"
"7eleven","city, street, 1",https://somelink/1     \\the good line
Baby-Gym,"city, street, 2\",https://somelink/2     \\the broken line because it has \", sequence

In this example, the second line of the CSV file is broken, as the value for "address" contains \", sequence.

I cannot change the CSV file. I just want to read and ignore these broken lines. However, I am experiencing unexpected behavior with the com.opencsv library when it reads more than one line (approximately 5k lines) using csvReader.readNext().

Here is the code I am using to read the CSV file:

try (Reader reader = new BufferedReader(new InputStreamReader(is))) {
    CSVParser parser = new CSVParserBuilder()
            .withSeparator(',')
            .withQuoteChar('"')
            .build();
    try (CSVReader csvReader= new CSVReaderBuilder(reader)
            .withSkipLines(1)
            .withCSVParser(parser)
            .build()) {

        Set<info> infoList = new HashSet<>();
        String[] infoParts;

        while ((infoParts = csvReader.readNext()) != null) {
            // code
        }
    }
}

How can I read line by line with OpenCSV while avoiding the need to ignore 5k lines due to the presence of a single broken line?

I can't find information anywhere on how to solve this problem. I tried using new CSVReaderBuilder(reader).withMultilineLimit(1), but it just throws exceptions... I looked at the CSVParser and CSVReader documentation but I didn't find the necessary settings. Please help me.

2

There are 2 best solutions below

0
tgdavies On

There is no really correct way of doing this, because quoted CSV fields are allowed to contain newlines. If you can require that the only newlines present are actually record separators, then you can pre-split your data into lines before parsing.

This means creating a new CSVParser for each row, which may be expensive -- I haven't benchmarked this approach:

package com.example.so;

import com.opencsv.CSVParser;
import com.opencsv.CSVParserBuilder;
import com.opencsv.CSVReader;
import com.opencsv.CSVReaderBuilder;
import com.opencsv.exceptions.CsvValidationException;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.List;
import java.util.Optional;
import java.util.stream.Collectors;

public class OpenCSVEg {
    public static void main(String[] args) throws CsvValidationException, IOException {
        String csv = "\"name\",\"address\",\"link\"\n" +
                "\"7eleven\",\"city, street, 1\",https://somelink/1\n" +
                "Baby-Gym,\"city, street, 2\\\",https://somelink/2\n" +
                "\"7eleven2\",\"city, street, 3\",https://somelink/3\n";

        try (BufferedReader reader = new BufferedReader(new StringReader(csv))) {
            List<String> csvLines = reader.lines().toList();
            List<String[]> items = csvLines.stream().skip(1).map(s -> {
                // the parser has internal state, so we need a new one for each row
                CSVParser parser = new CSVParserBuilder()
                        .withSeparator(',')
                        .withQuoteChar('"')
                        .build();
                try (CSVReader csvReader = new CSVReaderBuilder(new StringReader(s))
                        .withCSVParser(parser)
                        .build()) {

                    return Optional.of(csvReader.readNext());

                } catch (Exception e) {
                    return Optional.<String[]>empty();
                }
            }).flatMap(Optional::stream).collect(Collectors.toList());
            System.out.println(items);
        }
    }
}
0
Anna L On

I have resolved my problem by changing from using CSVReader to Scanner. However, I still use CSVParser to parse each line of the CSV file. Here is my code:

try (Scanner s = new Scanner(new BufferedInputStream(is))) {
    
    CSVParser parser = new CSVParserBuilder()
            .withSeparator(',')
            .withQuoteChar('"')
            .build();

    while (s.hasNextLine()) {
        String[] infoParts = parser.parseLine(s.nextLine());
        // code 
    }
}

My code now reads line-by-line and can parse the values.