pandas.read_csv() How to exclude specific separtor combinations

279 Views Asked by At

I have a csv like:

file:

1;a;3;4
1;2;b;4
1;[a;b];3;4

Loading like pd.from_csv(file, sep=';')

returns error:

ParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5

as the [a;b] is seen as a separator. Is there a way to exclude ; when in [ ]

Thanks

p.s. changing the file is impossible due to reasons

1

There are 1 best solutions below

2
mozway On BEST ANSWER

You can use ;(?![^\[]*\]) as regex separator to match only semicolons not inside brackets:

pd.read_csv(filename, sep=r';(?![^\[]*\])', engine='python')

demo:

text = '''1;a;3;4
1;2;b;4
1;[a;b];3;4
'''

import io
import pandas as pd

pd.read_csv(io.StringIO(text), sep=r';(?![^\[]*\])', engine='python')

output:

   1      a  3  4
0  1      2  b  4
1  1  [a;b]  3  4

regex demo