Background:
We provide our suppliers an excel-file, that they have to fill out. They have to fill out the cells in column I, starting at row 11 (See Picture: Head of the excel-file). In these cells they have to input values for exapmple like this: 8,76545 or 0,98565. (The decimal seperator is a comma: ,). The supplier decides, in which cells they add a value in column I. So some cells can be empty. The rows 1 to 9 in the excel-file can be ignored for this problem. Unfortunately the cells in column I are formated as percentage.
Head of the excel-file: Head
Example with filled values: Filled Cells
We get the filled file back as excel-file or csv-file.
We remove the rows 1 to 9 aswell as all rows, that have no value in column I (starting at row 11), we convert the excel file into csv and upload them into our ERP-System. This is done manually.
The goal is to automate this steps/process with Python:
Ignore rows 1 to 9
remove all rows with empty cells in column I (starting at row 11)
remove the percentage format
save the result as csv
The result should look like this: Result
Problem:
We are getting the filled file back as excel- or csv-file, with or without the percentage format. The excel-file with the percentage format is problematic.
If I open the excel-file, with the percentage format with pandas, pandas ist converting the values like this:.
Original value in the excel file: 8,61565740319208%
Converted value by pandas: 0.0861565740319208
The value should be 8,61565740319208% befor I remove the percentage sign with Python. (The dot-decimal seperator is not a problem)
Since the suppliers delivers the excel-files with or withou the percentage format, I just cant multiply the values in pandas by 100. Pandas is converting the values automatically, so I have no chance to check, if there is a % or not in the cell. I am using openpyxl as engine.
While my code is multiplying the values with 100, this is wrong.
Code:
import os
import pandas as pd
from datetime import datetime
def clear_date(date: str) -> str:
if len(date.strip()) == 10 and "." in date: #10.02.2023
return date
elif len(date.strip()) > 10 and "-" in date:
return datetime.fromisoformat(date).strftime("%d.%m.%Y")
def clear_faktor(faktor: str) -> str:
try:
faktor = faktor.replace("%","")
if "," in faktor:
return f"{faktor:0<8}"[:8]
elif float(faktor) < 0.1:
return f"{str(float(faktor[:10])*100).replace(".", ","):0<8}"[:8]
else:
return f"{faktor:0<8}"[:8]
except Exception as error:
print(error)
def clear_LF(lf: str) -> str:
if lf == "LF10":
return "LFA"
else:
return lf
if __name__ == "__main__":
#Set Path
path_main = os.path.dirname(os.path.realpath(__file__))
path_offer = os.path.join(path_main, "Offer")
path_cleared = os.path.join(path_main, "Cleared")
#create necessary folders"
if not os.path.exists(path_offer):
os.makedirs(path_offer)
if not os.path.exists(path_cleared):
os.makedirs(path_cleared)
original_files = iter(os.listdir(path_offer))
#Clear files
for file in original_files:
o_file = pd.DataFrame()
if os.path.splitext(file)[1].upper() == ".XLSX":
try:
o_file = pd.read_excel(os.path.join(path_offer, file), dtype="str")
except PermissionError:
continue
elif os.path.splitext(file)[1].upper() == ".CSV":
try:
o_file = pd.read_csv(os.path.join(path_offer, file), dtype="str", sep=";", encoding_errors="ignore")
except PermissionError:
continue
else:
continue
index_header = o_file[o_file.isin(["Typ", "MODEL_GUID", "VALID_FROM", "VALID_TO", "LAUFZEIT", "FAKTOR"]).any(axis=1)].index.values
if index_header and index_header.size == 1:
header = o_file.loc[index_header[0]].values
o_file = o_file.loc[index_header[0]+1:, :]
o_file.columns = header
o_file.dropna(subset=["FAKTOR"], inplace=True)
o_file["Typ"] = "LF"
o_file["LF"] = o_file["LF"].apply(clear_LF)
o_file["VALID_FROM"] = o_file["VALID_FROM"].apply(clear_date)
o_file["VALID_TO"] = o_file["VALID_TO"].apply(clear_date)
o_file["FAKTOR"] = o_file["FAKTOR"].apply(clear_faktor)
o_file = o_file.loc[:, "Typ":"FAKTOR"]
try:
o_file.to_csv(os.path.join(path_cleared, f"{os.path.splitext(file)[0]}.csv"), index=False, sep=";", encoding="utf-8")
except PermissionError:
continue
else:
continue
What I have tried:
I tried the converters parameter in the read_excel method with this function:
def to_excel(strng): return str(strng)
The parameter dtype="str" in the read_excel method has also not the effect, I want.
I also tried to open the excel-file with openpyxl, but the result is the same, like opening the file with pandas (pandas is using openpyxl as engine).
Nothing of this is working.