I am trying to teach myself how to use Dataform to make my bigquery processes more scalable. I currently utilize stored procedures, but would like to transfer these processes to Dataform.
I'm stuck on Dataform protocol to reference a dataset variable when creating tables or using dynamic sql when creating a procedure that creates dynamic set of tables. Here is an example of what I'm referring to (partial snapshot of a stored procedure I use)
SET
schemas = ARRAY(
SELECT
schema_name
FROM
my_project.INFORMATION_SCHEMA.SCHEMATA);
SET
arrSize = ARRAY_LENGTH(schemas);
SET
query = CONCAT("CREATE OR REPLACE TABLE `my_project.subset_of_datasets` as (select distinct(table_schema) from (select * from (");
WHILE
i < arrSize -1 DO
SET
query = CONCAT(query, "(select table_name, table_schema from my_project.",schemas[
OFFSET
(i)],".INFORMATION_SCHEMA.TABLES where contains_substr(table_name, 'p_AccountDeviceStat') and contains_substr(ddl, 'dpb_revenue') and contains_substr(ddl, 'dpb_transactions')) UNION ALL ");
SET
i = i + 1;
END WHILE
;
SET
query = CONCAT(query, "(select table_name, table_schema from my_project.",schemas[ORDINAL(arrSize)],".INFORMATION_SCHEMA.TABLES where contains_substr(table_name, 'p_AccountDeviceStat') and contains_substr(ddl, 'dpb_revenue') and contains_substr(ddl, 'dpb_transactions')))))");
EXECUTE IMMEDIATE
query;
Thanks in advance for any assistance or advice!
I've tried reading all the existing dataform documentation but haven't been able to identify anything relevant