Goal: run checks.yml on all Tables in Database, implicitly / dynamically (not naming 100s of Tables).
Following Soda's quick start, I've completed sections:
- Install Soda Core
- Connect Soda Core to a data source -
configuration.yml
Now I'm following Write a check and run a scan - checks.yml.
Problem
However, the documentation only gives examples for checking one Table each.
4 Checks
- Sum of Tables (in Database)
- Sum of Columns (across all Tables, in Database)
- Sum of Tables' descriptions exist
- Sum of Columns' descriptions exist
Queries return a COUNT().
So far, checks.yml:
# checks for MY_DATABASE:
sql_metrics:
name: num_tables, num_columns
sum_tables query: |
SELECT COUNT(*)
FROM information_schema.tables
WHERE table_schema = '*';
sum_columns query: |
SELECT COUNT(*)
FROM information_schema.columns
WHERE table_name = '*';
sum_tables_descriptions query: |
-- SQL
sum_columns_descriptions query: |
-- SQL
Your checks file should look like
Your checks must be based on conditions(to be checked or verified) or filters that are not available yet on documentation. ~~ means like and % is the wildcard. Although condition all and no condition gives same result, therefore where clause is not neccessary.
Or you could dynamically create checks files with this script, using a list of datasets(tables) with 'for each dataset T: ' clause, like: