How to take excel table data and convert it into python code to transform data

76 Views Asked by At

I have an excel file that acts like a case statement:

Condition Operator Comparison Output
Country == 'US' '1'
Country == 'UK' '2'

-Condition is the Column name in the csv data file -Output should only be returned if the statement is true

The excel file is meant to be dynamic so that any user can add a new statement and the script will automatically be able to transform the data using the new rule.

I want to loop through all the excel statements and transform the csv file at the column level.

My question is how do I convert the excel table data into working python code?

I've tried both the eval() and getattr() methods:

import pandas as pd

df = pd.read_csv('DummyData.csv')

transform = pd.read_excel("t.xlsx")

def transform_data(row):
    for i in range(len(transform)):
        condition = transform.iloc[i,0]
        operator = transform.iloc[i,1]
        comparison = transform.iloc[i,2]
        output = transform.iloc[i,3]
        
        if eval(f"row['{condition}']{operator}"):
            return output
        return None
        
df["Country"] = df.apply(transform_data(1), axis = 1)


or getattr() where the comparison part of the excel file is in the operator column

import pandas as pd
import operator
df = pd.read_csv('DummyData.csv')

transform = pd.read_excel("t.xlsx")
def transform_data(row):
    for i in range(len(transform)):
        condition = transform.iloc[i,0]
        operator_str = transform.iloc[i,1]
        output = transform.iloc[i,2]
        
        operator_func = getattr(operator, operator_str)
        
        if operator_func(row[condition],output):
            return output
    return None

df["Country"] = df.apply(transform_data, axis = 1)

But nothing is working and I'm not sure how to continue

2

There are 2 best solutions below

0
Maria K On

I think you have to finish your eval statement to get a correct output.

And also I shifted return None 1 indent to the left, because you want to apply all transform checks and see if they fit before returning None (i.e. finish the for loop).

transform = pd.DataFrame({'Condition': ['Country', 'Country'],
    'Operator': ['==', '=='],
    'Comparison': ['US', 'UK'],
    'Output': ['1', '2']
})

df = pd.DataFrame({
    'Country': ['US', 'UK', 'other']
})

def transform_data(row):
    for i in range(len(transform)):
        condition = transform.iloc[i,0]
        operator = transform.iloc[i,1]
        comparison = transform.iloc[i,2]
        output = transform.iloc[i,3]
        
        if eval(f"row['{condition}']{operator}'{comparison}'"):
            return output
    return None
        
print(df.apply(transform_data, axis=1))

Output:

0       1
1       2
2    None
dtype: object
0
Elzam1n On

I think we can take short way here.

df.loc[df['Country']=='UK','Output']=2
df.loc[df['Country']=='US','Output']=1
df.loc[df['Country']=='other','Output']=0