I am new to Pandera and using it to run some schema validations on my dataframe. I want to use a mix of warnings and errors. The error part is what is working seamlessly. What I am doing is to catch the rows in the original dataframe which failed validations using index column of failure_cases in SchemaErrors and send them back to users as a dataframe.
Below works for errors.
except pa.errors.SchemaErrors as e:
# run some custom logic logic using the index of the row which failed validation
print(e.failure_cases.groupby('index')
I want to do the same with Warnings as well but I am not able to find an easy way to get a handle on index of the validation failed rows in case of warnings and I get warnings as type = str.
So for below code, I get
with warnings.catch_warnings(record=True) as caught_warnings:
dynamic_schema.validate(df, lazy=True, inplace=False)
if caught_warnings:
print(type((warning.message.args[0])) # print str
output (below is one single string)
<Schema Column(name=series_value_date, type=DataType(str))> failed element-wise validator 0:
<Check validate: last saved date check failed>
failure cases:
index failure_case
0 0 2024-03-26T00:00:00.0000000
1 1 2024-03-24T00:00:00.0000000
2 2 2024-03-23T00:00:00.0000000
3 3 2024-03-22T00:00:00.0000000
If somehow, I can access index column in above string output, this will fulfill my use case. Is there a way to do this ?