I have upgraded pandas package, new version is : 1.4.2 and xlrd package, new version is 2.0.1
Now, when I read python file with the following command :
import pandas as pd pd.read_excel('myfile.xlsx')
I got the following warning: UserWarning: Workbook contains no default style, apply openpyxl's default
And my result is :
| foo | bar |
|---|---|
| _x0031_01259 | COMMUNAUTE_x0020_DE_x0020_COMMUNES_x0020_FIER_x0020_ET_x0020_USSES |
While it should be :
| foo | bar |
|---|---|
| 101259 | COMMUNAUTE DE COMMUNES FIER ET USSES |
So in some columns, 1 is replaced by x0031, 2 is replaced by x0033, space is replaced by x0020, etc.
I tried to add engine parameter set to openpyxl but same warning message and same dataframe result.
Before Pandas package upgrade, I already had the problem but with engine parameter set to xlrd, it was working (but I had a warning saying that newest version won't support xlrd)
Any idea how to read correctly the file?
xlrdpackage is not used for.xlsxfiles inpandas.As you can read in pandas docs, for
.xlsxit is usingopenpyxlthat does not handle readingxlsxproperly in some specific scenarios, causing error you have described - check this bug reportIt has been working for you before, because
xlrdused to support.xlsxfiles, but since version 2.0.0 it supports only "classic".xlsfiles.As a workaround, you can use
xlrdcode snippet from 1.x.x version in pandas, however you have to ensure your columns does not include strings like_x0000_that you want to preserve: