I am using pandas 2.2.0 and xlrd version is 2.0.1. The code snippet
import pandas as pd
filepath = './data/myfile.xls'
df = pd.read_excel(filepath)
generates the following log:
Traceback (most recent call last):
File "/home/singhd/PycharmProjects/Debugging/main.py", line 30, in <module>
df = pd.read_excel(file_path)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 495, in read_excel
io = ExcelFile(
^^^^^^^^^^
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
self._reader = self._engines[engine](
^^^^^^^^^^^^^^^^^^^^^^
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_xlrd.py", line 46, in __init__
super().__init__(
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 573, in __init__
self.book = self.load_workbook(self.handles.handle, engine_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_xlrd.py", line 63, in load_workbook
return open_workbook(file_contents=data, **engine_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/__init__.py", line 172, in open_workbook
bk = open_workbook_xls(
^^^^^^^^^^^^^^^^^^
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/book.py", line 68, in open_workbook_xls
bk.biff2_8_load(
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/book.py", line 637, in biff2_8_load
cd = compdoc.CompDoc(self.filestr, logfile=self.logfile,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/compdoc.py", line 227, in __init__
dbytes = self._get_stream(
^^^^^^^^^^^^^^^^^
File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/compdoc.py", line 293, in _get_stream
if self.seen[s]:
~~~~~~~~~^^^
IndexError: array index out of range
It reads the file when I open it in Excel and save it as .xlsx, so the .xls file does not seem to be corrupt. What is going wrong here? What else can I try? Is this a well-known issue?