I'm trying to read a binary file (.parquet) located in an ftp server using pandas read_parquet:
import pandas as pd
df = pd.read_parquet('ftp://ftp.hostname.com/binary/filename.parquet',engine='fastparquet')
I get the following error message:
FileNotFoundError: ftp://ftp.hostname.com/binary/filename.parquet
Even though the file is clearly in that path, and I've checked the path name.
Extra Info:
When accessing .csv files in that same ftp server, there are no errors:
pd.read_csv('ftp://ftp.hostname.com/csv/filename.csv')
It's only when using pd.read_parquet to read binary files in ftp server. I've also tried engine='pyarrow', but the results are the same.
When I download and save the file locally, and open it using pd.read_parquet it works fine.
Download using python urllib:
import urllib.request
urllib.request.urlretrieve('ftp://ftp.hostname.com/binary/filename.parquet', 'file')
When opening using request:
from urllib import request
req = request.urlopen('ftp://ftp.hostname.com/binary/filename.parquet')
df = req.read()
I get the following result:
df = '\x00\x11...'
Not sure if it's an issue with the file encoding.
UPDATE:
Full Traceback read_parquet:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "...\Python312\Lib\site-packages\pandas\io\parquet.py", line 670, in read_parquet
return impl.read(
^^^^^^^^^^
File "...\Python312\Lib\site-packages\pandas\io\parquet.py", line 400, in read
parquet_file = self.api.ParquetFile(path, **parquet_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\Python312\Lib\site-packages\fastparquet\api.py", line 178, in __init__
raise FileNotFoundError(fn)
FileNotFoundError: ftp://ftp.hostname.com/binary/filename.parquet
Attempting to access same .parquet file with read_csv traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "...\Python312\Lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\Python312\Lib\site-packages\pandas\io\parsers\readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\Python312\Lib\site-packages\pandas\io\parsers\readers.py", line 1448, in __init__
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\Python312\Lib\site-packages\pandas\io\parsers\readers.py", line 1723, in _make_engine
return mapping[engine](f, **self.options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\Python312\Lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 93, in __init__
self._reader = parsers.TextReader(src, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "parsers.pyx", line 579, in pandas._libs.parsers.TextReader.__cinit__
File "parsers.pyx", line 668, in pandas._libs.parsers.TextReader._get_header
File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2050, in pandas._libs.parsers.raise_parser_error
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 7-8: invalid continuation byte