I have some URLs for downloading CSV files.
import pandas as pd
import io
import requests
url1 = 'https://www.ons.gov.uk/generator?format=csv&uri=/economy/economicoutputandproductivity/output/timeseries/' + 'k22a' + '/diop'
url2 = 'https://www.ons.gov.uk/generator?format=csv&uri=/economy/economicoutputandproductivity/output/timeseries/' + 'k24c' + '/diop'
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))
When I use url1, there is a ',' in the 4th record. But some urls (url2) dont have this unexpected separator. This is causing
ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 2
when I try to merge the CSV files into a single dataframe. How do I ignore these unexpected separators. Anyway the first seven records are to be deleted. But I still get this error.
This solution suggests we pre-parse each line before converting into CSV. Since I have many such URLs, and don't know for sure which unexpected delimiters would be encountered in future, not sure how to debug. Can pre-parsing before converting to CSV work? How to implement in such a manner to include other separators encountered in the future?
Since you don't need the metadata, just skip it using the
skiprowsparameter ofread_csv. As a nice side effect, you'll also have the correct dtypes automatically:Output:
If you don't even need headers:
Output: