when doing dividing operations in pandas, i always get 'NAN' results. how can I solve the problem?

96 Views Asked by At

I want to make column 'ratio' that is the result after each value of the column 'amount' divides the last value of the column 'amount'. the data type of amount column is int64. After changing the data type to float, I also got the same 'NAN' value.

enter image description here

4

There are 4 best solutions below

1
Vitalizzare On BEST ANSWER

When you do any math on several data frames or sequences, Pandas aligns on indexes and columns by default. tail(1) returns not a single value (scalar) but a sequence with the last index of the original data. When you divide the column on the obtained sequence, data are merged on indexes and then divided on corresponding values. Since tail contains only the value with the last index, the merge ends up with nan values as corresponding divisors for all dividends except the last one. That's why you got nan everywhere except at the last position.

To avoid this behavior, pass the divisor either as a number or a numpy.array. In this case, it can be

dt['amount'] / dt['amount'].tail(1).values    # divide on a numpy.array
dt['amount'] / dt['amount'].iloc[-1]          # divide on a number
0
RomanPerekhrest On

Instead of tail specify the location of the last value:

df['amount'] / df['amount'].iloc[-1]
2
user2314737 On

You could use shift like this:

import pandas as pd

data = {'amount': range(4,8), 'user_input': ['a', 'b', 'c', 'd']}

dt = pd.DataFrame.from_dict(data)

dt
# Out: 
#    amount user_input
# 0       4          a
# 1       5          b
# 2       6          c
# 3       7          d

dt['ratio'] = dt['amount']/dt['amount'].shift(1)

dt
# Out: 
#    amount user_input     ratio
# 0       4          a       NaN
# 1       5          b  1.250000
# 2       6          c  1.200000
# 3       7          d  1.166667

Note that if you have a division by zero you will get an inf and of course the first value in the 'ratio' column is undefined.

0
ragas On

A different take of same approach:

import pandas as pd

data = {
    'col1': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
    'col2': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
}

# Convert data into DataFrame
df = pd.DataFrame(data)
df = df.assign(new_col = df['col2']/df['col2'].values[-1])
print(df)