I am seeking for pythonic way to deal with Pandas DataFrame. Suppose I have DataFrame looks like below:
| Account | Stage | Outstanding | Installment | EIR |
|---|---|---|---|---|
| A | 1 | 10000 | 100 | 0.07 |
| B | 2 | 50000 | 500 | 0.04 |
| C | 3 | 10000 | 100 | 0.07 |
I am trying to do the amortization table from the given information by stage. For example:
Account A Stage 1 will be amortized for 12 months
Account B Stage 2 will be amortized until Outstanding = 0 (or close to 0)
Account C Stage 3 will NOT be amortized
I have SAS Code to perform such a logic explained earlier per below:
data want;
set have;
if Stage = 1 then do;
do Term = 1 to 12;
Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
if Outstanding < 0 then delete;
output;
end;
end;
else if Stage = 2 then do;
do Term = 1 to Term;
Outstanding = Outstanding - (abs(Installment) - (Outstanding * EIR / 100 / 12));
if Outstanding < 0 then delete;
output;
end;
end;
else if Stage = 3 then do;
Outstanding = Outstanding;
output;
end;
run;
After run the code will provide output table looks like below (the number is just mock-up):
| Account | Stage | Outstanding | Installment | EIR | Term |
|---|---|---|---|---|---|
| A | 1 | 10000 | 100 | 0.07 | 1 |
| A | 1 | 9000 | 100 | 0.07 | 2 |
| A | 1 | 8000 | 100 | 0.07 | 3 |
| A | 1 | ... | ... | ... | ... |
| A | 1 | 2000 | 100 | 0.07 | 12 |
| B | 2 | 50000 | 500 | 0.04 | 1 |
| B | 2 | 49000 | 500 | 0.04 | 2 |
| B | 2 | 48000 | 500 | 0.04 | 3 |
| B | 2 | ... | ... | ... | ... |
| B | 2 | 125 | 500 | 0.04 | 48 |
| C | 3 | 10000 | 100 | 0.07 | 1 |
I have the python code to the same but I think it is not efficient. I have somethings like:
# Amortization function
def balances(rate, payment, os):
interestAmount = os * rate / 100 / 12
nextBalance = os + interestAmount - payment
return nextBalance
Then, I used the for-loop to call the function account by account and used np.repeat() to repeat the information I needed.
result = []
for i, account in enumerate(df['Account']):
if i % 5000 == 0:
print(f'Calcultion account: {i}')
accountTable = df[df['Account'] == account]
rate = float(accountTable['EIR'])
payment = float(accountTable['Installment'])
amount = float(accountTable['Outstanding'])
if int(accountTable['Stage']) <= 2:
while amount > 0:
amount = balances(rate, payment, amount)
amortization.append(amount)
if amortization[-1] <= 0:
amortization.pop(-1)
amortizationTable = pd.DataFrame(np.repeat(accountTable.values, len(amortization), axis = 0), columns = accountTable.columns)
amortizationTable['Outstanding'] = amortization
amortizationTable['Term'] = amortizationTable.index + 1
result.append(amortizationTable)
I found it extremely slow compared to SAS Programming. Any suggestion to improve the speed or make it more pythonic way.
Thank you.
Try this: