How do I perform a function for each row of dataframe based on the values of two columns?

38 Views Asked by Manta Raych At 24 April 2023 at 04:19

I have a pandas dataframe that has columns for Date, Minimum Temp, and Maximum Temp. I want to run through the dataframe and for each day, first determine the mean temperature (using min and max) if the max temp is >86; otherwise, I want it do something else to calculate the mean. Then, using the mean obtained via the first function, I want to run another function and collect the output from that in an array. I hit an error with the "truth value of a series" being ambiguous.

This is the code I've written so far:

#Function defining how to obtain the mean based on max temp
def MeanTemp(T_min, T_max):
    if T_max < 86:
        mean = np.mean(T_max, T_min)
    else:
        mean = np.mean(86, T_min)
    return mean

#Function that will use the mean from the MeanTemp function
def GrowingDegreeDays(mean,base):
    if mean > base:
        GDD = mean-base
    else:
        GDD = 0

#For each row in my dataframe, I want it to perform these two functions
for Date in df:
    mean = MeanTemp(T_min, T_max)
    GrowingDegreeDays(mean,50)

When I run this, I get the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). It throws up this error because of the line if T_max < 86:; from some research, I think this is because it is trying to run the function on the whole column? How do I get it to just look at that value for that specific row? Or is something else going on? I am a very newbie coder, so I appreciate the user of simpler language ;)

Thanks for your help!

Original Q&A

There are 1 best solutions below

Corralien On 24 April 2023 at 05:23

You are right about your error. Furthermore, try to use vectorized code to be more efficient:

mean = df['T_max'].clip(upper=86).add(df['T_min']).div(2)
gdd = mean.sub(50).clip(lower=0)

clip can avoid your conditional statements.

Output:

>>> mean
0    52.5
1    49.0
2    49.5
3    51.0
4    51.0
5    49.5
6    56.5
7    54.5
8    49.5
9    56.5
dtype: float64

>>> gdd
0    2.5
1    0.0
2    0.0
3    1.0
4    1.0
5    0.0
6    6.5
7    4.5
8    0.0
9    6.5
dtype: float64

Minimal Reproducible Example:

import pandas as pd
import numpy as np

N = 10
rng = np.random.default_rng(2023)
df = pd.DataFrame({'Date': pd.date_range('2023-04-01', periods=N, freq='D'),
                   'T_min': rng.integers(10, 40, N),
                   'T_max': rng.integers(80, 100, N)})

How do I perform a function for each row of dataframe based on the values of two columns?

There are 1 best solutions below

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in VALUEERROR

Related Questions in AMBIGUOUS

Trending Questions

Popular # Hahtags

Popular Questions