So I'm using an open-source tool to compare the row-by-row data between 2 tables sitting in 2 different databases in in MSSQL and the other in Snowflake.
The connection setups for both were fine. I could run the row count comparison successfully and got the count using the cmd below:
data-validation validate column -sc snowflake -tc snowflake -tbls MY_SCH.MY_TABLE
However I ran into issues when running the row-by-row comparison. For a test, I am comparing a snowflake table to itself. The cmd is as below:
data-validation validate row -sc snowflake -tc snowflake -tbls MY_SCH.MY_TABLE --primary-keys MY_PRIMARY_KEY --hash '*'
Where:
row: type of data validation,-sc: source connection-tc: target connectiontbls: table name(s)--primary-keys: the primary key column being used to match between the 2 tables--hash:*for validating all fields or a list of field names for a selected list only
This error message followed , I've tried comparing different tables, selected columns, using SQL Server database. All came out with the same error:
File "C:\Users\admin\AppData\Local\Programs\Python\Python310\lib\site-packages\ibis\common\validators.py", line 467, in sequence_of
raise IbisTypeError(f'Arg must have at least {min_length} number of elements')
ibis.common.exceptions.IbisTypeError: Arg must have at least 1 number of elements
The Github repo for the data validation tool is here: https://github.com/GoogleCloudPlatform/professional-services-data-validator
Appreciate any suggestions!
The answer is as below:
The key is to follow the syntax of for parameter -tbls
MY_SCH.MY_TABLE=MY_SCH.MY_TABLEand remove all quotations.