Converting numpy int64/Scaler int to Pandas dtypes Int64

61 Views Asked by At

Python Version : 3.8.10 --> Upgraded from Python 3.6

Numpy version : 1.21.5 --> upgraded from Numpy 1.19.5

great-expectation : 0.16.16 --> upgraded from great-expectation 0.13.11

While validating the dataframe with great-expectation getting the exception for column (NumMonths) actual datatype and expected datatype

DataFrame

data = {'Ids': [f'A{i}' for i in range(1, 11)],
        'NumMonths': [3, 6, 9, 12, 3, 6, 9, 12, 3, 6]}
df = pd.DataFrame(data)
df['NumMonths'] = df['NumMonths']

Great-Expectations

{
    "data_asset_type": "Dataset",
    "expectation_suite_name": "testsuite",
    "expectations": [
        {
            "expectation_type": "expect_table_columns_to_match_set",
            "kwargs": {
                "column_set": [
                    "Ids",
                    "NumMonths"
                ],
                "exact_match": true
            },
            "meta": {
                "severity": "critical"
            }
        },
        {
            "expectation_type": "expect_select_column_values_to_be_unique_within_record",
            "kwargs": {
                "column_list": [
                    "Ids"
                ]
            },
            "meta": {
                "severity": "critical"
            }
        },
        {
            "expectation_type": "expect_column_values_to_be_in_type_list",
            "kwargs": {
                "column": "Ids",
                "type_list": [
                    "str"
                ]
            },
            "meta": {
                "severity": "critical"
            }
        },
        {
            "expectation_type": "expect_column_values_to_be_in_type_list",
            "kwargs": {
                "column": "NumMonths",
                "type_list": [
                    "Int32",
                    "Int64"
                ]
            },
            "meta": {
                "severity": "critical"
            }
        },
        {
            "expectation_type": "expect_column_values_to_be_between",
            "kwargs": {
                "column": "NumMonths",
                "min_value": 3,
                "max_value": 12
            },
            "meta": {
                "severity": "warning"
            }
        },
        {
            "expectation_type": "expect_column_values_to_not_be_null",
            "kwargs": {
                "column": "Ids",
                "mostly": 1
            },
            "meta": {
                "severity": "critical"
            }
        },
        {
            "expectation_type": "expect_column_values_to_not_be_null",
            "kwargs": {
                "column": "NumMonths",
                "mostly": 1
            },
            "meta": {
                "severity": "critical"
            }
        }
    ],
    "meta": {
        "great_expectations_version": "0.16.16"
    }
}

Error: enter image description here

The above code was working fine with older versions of python, numpy and great-expectations.

Answers*:

  1. While Investigating I found the Numpy 1.20.0, there is a change in the dtypes. https://numpy.org/doc/stable/release/1.20.0-notes.html

  2. I also deep dived into the great-expectations github repository, they have written the work around for numpy dtype change. https://github.com/great-expectations/great_expectations/blob/develop/great_expectations/expectations/core/expect_column_values_to_be_in_type_list.py#L319

  3. I used the python astype(), convert_dtypes() to change the datatype, it changes the dtype to Int64, as soon it goes into validate() of great expectations it consider it as int64 https://github.com/great-expectations/great_expectations/blob/develop/great_expectations/expectations/core/expect_column_values_to_be_in_type_list.py#L269

The above code works if I change the Numpy version to 1.19.5

0

There are 0 best solutions below