I have a DataFrame schema that I have created using Pandera's DataFrame model called 'params'. This DataFrame is basically DataFrame of floats, for which I need to do some validation on before it is used my application.
I have another DataFrame, let's call it params2 for which the one of the validation checks being run on params needs access to.
I can not store params2 in the columns of params. How can I pass this DataFrame along to the validation checks being run on params?
So far I tried creating a custom class called SecretDF that inherits from pandas DataFrame, and creates an empty DataFrame hidden inside the class. I added additional methods that would allow accessing that DataFrame, but the Pandera checks can't see that this method is defined on the DataFrame.
Below is a minimal reproducible example of the problem.
import pandas as pd
import pandera as pa
# Example dataframes
params = pd.DataFrame({
'value': [1.0, 2.0, 3.0]
})
params2 = pd.DataFrame({
'reference_value': [0.5, 2.5, 3.5]
})
# A custom class inheriting from pd.DataFrame to hold a secret dataframe
class SecretDF(pd.DataFrame):
def __init__(self, *args, **kwargs):
self._secret_df = None
super().__init__(*args, **kwargs)
def set_secret(self, df):
self._secret_df = df
def get_secret(self):
return self._secret_df
# Using DataFrameModel to define schema
class ParamsSchema(pa.DataFrameModel):
value: pa.Column[float] = pa.Field(gt=0, check_name=True, nullable=False)
# Check function to validate values of params based on params2
@pa.dataframe_check
def validate_based_on_secret_df(pls, df: pd.DataFrame):
if secret_df is None:
return True
return (df[‘value’] < df._secret_df["reference_value"]).all()
# Try to validate
params_with_secret = SecretDF(params)
params_with_secret.set_secret(params2)
try:
pa_params = pd.DataFrame[ParamsSchema](
params_with_secret
)
print("Validation passed")
except Exception as e:
print(f"Validation failed: {e}")
This does not work, as df won’t have the attribute _secret_df in the check validate_based_on_secret_df. For my specific application, having the check column be inside of params is NOT an option, this is a simplified example from my repo. Creating the functions and checks on the fly is not an option either, as the schema is hardcoded in a .py file. What can be done here?