I am trying to set up a DataFrameSchema in Pandera. The catch is that one of the columns of data may be a float or an int, depending on what data source was used to create the dataframe. Is there a way to set up a check on such a column? This code failed:
import pandera as pa
from pandera.typing import DataFrame, Series
from datetime import datetime
import pandas as pd
class IngestSchema(pa.SchemaModel):
column_header: Series[float | int] = pa.Field(alias = 'MY HEADER')
Other things I've tried:
from typing import Union
float_int = Union[float, int]
But pandera does not recognize that union as a datatype. Is there any way to set up such a schema?
Digging into their docs they have a
is_numericwhich checks if its a _Number datatype. But it's a private var atm so maybe someday down the line? In the meantime you can go with the suggested workaround:I see you're using the
SchemaModelwhich I'm not very familiar with. I tested this locally and it worked though (w caveat of uncertainty regarding theSeriesannotation:Note that
pa.DataFrameModelis the updated syntax andSchemaModelserves as an alias for it.SchemaModelwill be deprecated in version 0.20.0 as mentioned in the docs.