Hi I have a CSV File where the encapsulator character is not escaped properly.
Example
[email protected],"uhrege gerjhhg er<span style="background-color: rgb(0,153,0);">eriueiru kernger</span><font color="#009900"><span style="background-color: rgb(255,255,255);"> weiufhuweifbw fhew fibwefbw</span></font><div><font color="#009900"><span style="background-color: rgb(255,255,255);">wekifbwe fewf</span></font></div><div><font color="#009900"><span style="background-color: rgb(255,255,255);">weiuifgewbfjew f</span></font></div>",18-Oct-2016,
Delimiter -> ,
Encapsulator -> "
It breaks when I try to read using commons-csv reader ,
throws a ' invalid char between encapsulated token and delimiter' Exception .
However Microsoft excel seems to open the file perfectly. Any ideas on how to procced ? .
How does one parse CSV files where the encapsulator is not escaped properly ?.Excel seems to open such files fine.
If you can't fix this at the source (i.e. generate a well-formed csv), and you want to parse this yourself, you could go the easy way:
Scan field1 up to
,"- field2 up to",- rest is field3 (trailing comma?).Of course if a
",occurs in the html field, there's a problem. You could solve that by first scanning up to,", and then backwards (starting at the end of the line) to",.If there are more fields than you show here, you could look for a
,combined with a"(both combinations, could also be",") and hope those do not appear in the field data.