I have a huge CSV file (196244 line) where it has \n in place other than new lines, I want to remove those \n but keep \r\n intact.
I've tried line.replace but seems like it is not recognizing \r\n so next I tried regex
with open(filetoread, "r") as inf:
with open(filetowrite, "w") as fixed:
for line in inf:
line = re.sub("(?<!\r)\n", " ", line)
fixed.write(line)
but it is not keeping \r\n it is removing everything. I can't do it in Notepad++ it is crashing on this file.
You are not exposing the line breaks to the regex engine. Also, the line breaks are "normalized" to LF when using
openwithrmode, and to keep them all in the input, you can read the file in in the binary mode usingb. Then, you need to remember to also use thebprefix with the regex pattern and replacement.You can use
Now, the whole file will be read into a single string (with
inf.read()) and the line breaks will be matched, and eventually replaced.Pay attention to
"rb"when reading file in"wb"to write file outre.sub(b"(?<!\r)\n", b" ", inf.read())containsbprefixes with string literals, andinf.read()reads in the file contents into single variable.