"TypeError: statistics is of type StatsOptions, should be a DatasetFeatureStatisticsList proto." error shows when I am generating schema using tfdv.infer_schema() option but I am not able to do when I filter relevant feature using tfdv.StatsOptions class using feature_allowlist. So can anyone help me in this ?
features_remove= {"region","fiscal_week"}
columns= [col for col in df.columns if col not in features_remove]
stat_Options= tfdv.StatsOptions(feature_allowlist=columns)
print(stat_Options.feature_allowlist)
schema= tfdv.infer_schema(stat_Options)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-53-e61b2454028e> in <module>
----> 1 schema= tfdv.infer_schema(stat_Options)
2 schema
C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_data_validation\api\validation_api.py in infer_schema(statistics, infer_feature_shape, max_string_domain_size, schema_transformations)
95 """
96 if not isinstance(statistics, statistics_pb2.DatasetFeatureStatisticsList):
---> 97 raise TypeError(
98 'statistics is of type %s, should be '
99 'a DatasetFeatureStatisticsList proto.' % type(statistics).__name__)
TypeError: statistics is of type StatsOptions, should be a DatasetFeatureStatisticsList proto.
For the very simple reason that you have to pass a statistics_pb2.DatasetFeatureStatisticsList object to the tfdv.infer_schema function and not the statsOptions.
You should go this way :