When using FLATTEN function in Snowflake, we use the values in the PATH column, to dynamically construct a SELECT clause to query the semi-structured (JSON) column in the data table.
We concatenate:
<variant_col>:<value_in_path_column>,
This means we assume the values in the PATH column are to be used with the Dot Notation (as described here)
We encountered a rare scenario, where the field key inside the variant column conatains a dot (.) in it, for example:
{
"0.text": "some_string_value",
"legal_name": "other_string_value"
}
In this case, after running the FLATTEN function, we get the following output:
| VARIANT_COL | SEQ | KEY | PATH | INDEX | VALUE |
|---|---|---|---|---|---|
| { "0.text": "foo", "legal_name": "bar" } | 1 | 0.text | ['0.text'] | "foo" | |
| { "0.text": "foo", "legal_name": "bar" } | 1 | legal_name | legal_name | "bar" |
We expect the value in the PATH column to be "0.text" instead of ['0.text']
Is there a way to force it into a Dot Notation?
*EDIT:the current solution is to edit the code, so if we encounter this case, we will explicitly edit it. But it is a workaround to something that looks like an inconsistent behavior of Snowflake. I will also create a support ticket.
Thanks
Code to recreate the issue:
create temp table notation_issue
as select parse_json($${"0.text": "foo", "legal_name": "bar"}$$) as variant_col;
create temp table flatten_results as
select *
from notation_issue,
lateral flatten(variant_col, recursive=>True);
select 'select ' || listagg('variant_col:' || path, ',\n') || ' from notation_issue'
from flatten_results
;
/* This raises an error */
select variant_col:['0.text'], variant_col:legal_name from notation_issue;
/* This does not raise an error */
select variant_col['0.text'], variant_col:legal_name from notation_issue;
/* This also does not raise an error */
select variant_col:"0.text", variant_col:legal_name from notation_issue;