How to duplicate row in Pandas DataFrame

Question

How to duplicate row in Pandas DataFrame

172 Views Asked by Sasiwut Chaiyadecha At 01 January 2022 at 09:46

I am seeking for pythonic way to deal with Pandas DataFrame. Suppose I have DataFrame looks like below:

Account	Stage	Outstanding	Installment	EIR
A	1	10000	100	0.07
B	2	50000	500	0.04
C	3	10000	100	0.07

I am trying to do the amortization table from the given information by stage. For example:

Account A Stage 1 will be amortized for 12 months
Account B Stage 2 will be amortized until Outstanding = 0 (or close to 0)
Account C Stage 3 will NOT be amortized

I have SAS Code to perform such a logic explained earlier per below:

data want;
set have;

if Stage = 1 then do;
    do Term = 1 to 12;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 2 then do;
    do Term = 1 to Term;
        Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
        if Outstanding < 0 then delete;
        output;
        end;
    end;

else if Stage = 3 then do;
    Outstanding = Outstanding;
    output;
    end;

run;

After run the code will provide output table looks like below (the number is just mock-up):

Account	Stage	Outstanding	Installment	EIR	Term
A	1	10000	100	0.07	1
A	1	9000	100	0.07	2
A	1	8000	100	0.07	3
A	1	...	...	...	...
A	1	2000	100	0.07	12
B	2	50000	500	0.04	1
B	2	49000	500	0.04	2
B	2	48000	500	0.04	3
B	2	...	...	...	...
B	2	125	500	0.04	48
C	3	10000	100	0.07	1

I have the python code to the same but I think it is not efficient. I have somethings like:

# Amortization function
def balances(rate, payment, os):
    interestAmount = os * rate / 100 / 12   
    nextBalance = os + interestAmount - payment
    return nextBalance

Then, I used the for-loop to call the function account by account and used np.repeat() to repeat the information I needed.

result = []
for i, account in enumerate(df['Account']):
    if i % 5000 == 0:
        print(f'Calcultion account: {i}')
    accountTable = df[df['Account'] == account]
    rate = float(accountTable['EIR'])
    payment = float(accountTable['Installment'])
    amount = float(accountTable['Outstanding'])

    if int(accountTable['Stage']) <= 2:
        while amount > 0:
            amount = balances(rate, payment, amount)
            amortization.append(amount)
            if amortization[-1] <= 0:
                amortization.pop(-1)
            amortizationTable = pd.DataFrame(np.repeat(accountTable.values, len(amortization), axis = 0), columns = accountTable.columns)
            amortizationTable['Outstanding'] = amortization
            amortizationTable['Term'] = amortizationTable.index + 1
            result.append(amortizationTable)

I found it extremely slow compared to SAS Programming. Any suggestion to improve the speed or make it more pythonic way.

Thank you.

Original Q&A

There are 1 best solutions below

**Abhishek Jain** · Answer 1 · 2022-01-01T13:47:52.620000

Try this:

import pandas as pd
df = pd.DataFrame({'acc': ['a', 'b', 'c'],
                   'stage': [1, 2, 3],
                   'bal': [10000, 50000, 10000],
                   'installment': [100, 500, 100],
                   'eir': [0.07, 0.04, 0.07],
                   })


def computeBal(bal, eir, installment):
    intt = bal * eir / 12 / 100
    next_bal = bal + intt - installment
    return next_bal


def processAccount(df_acc):
    acc = df_acc['acc'].values[0]
    stg = int(df_acc['stage'])
    bal = float(df_acc['bal'])
    eir = float(df_acc['eir'])
    installment = float(df_acc['installment'])

    amort = []
    amort.append(bal)
    if stg == 1:
        for _ in range(1, 12):
            bal = computeBal(bal, eir, installment)
            amort.append(round(bal, 2))
    elif stg == 2:
        while bal > 0:
            bal = computeBal(bal, eir, installment)
            if bal > 0:
                amort.append(round(bal, 2))

    out = pd.DataFrame(amort)
    out['acc'] = acc
    out['stage'] = stg
    out['installment'] = installment
    out['eir'] = eir
    out.reset_index(inplace=True)
    out.rename(columns={0: 'bal', 'index': 'term'}, inplace=True)
    out['term'] += 1

    return out[['acc', 'stage', 'bal', 'installment', 'eir', 'term']]


result = dict()
for acc in df['acc'].unique():
    df_acc = df.loc[df['acc'] == acc, :].copy()
    result[acc] = processAccount(df_acc)


out = pd.concat(result).reset_index(drop=True)
out

How to duplicate row in Pandas DataFrame

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in SAS

Related Questions in AMORTIZATION

Trending Questions

Popular # Hahtags

Popular Questions