I am looking to flatten a column named 'EVENT_PARAMS_JSON' in snowflake having json values.
select EVENT_PARAMS_JSON from GA4_EVENT_DETAILS limit 1;
results in below output.
[
{
"key": "firebase_screen",
"value":
{
"double_value": null, "float_value": null, "int_value": null, "string_value": "rewards"
}
},
{
"key": "firebase_previous_screen",
"value":
{
"double_value": null, "float_value": null, "int_value": null, "string_value": "fuelpay_dashboard_offsite"
}
},
{
"key": "firebase_screen_class",
"value":
{
"double_value": null, "float_value": null, "int_value": null, "string_value": "View"
}
},
{
"key": "firebase_previous_class",
"value":
{
"double_value": null, "float_value": null, "int_value": null, "string_value": "View"
}
},
{
"key": "firebase_screen_id",
"value":
{
"double_value": null, "float_value": null, int_value": "4511208285398238808", string_value": null
}
},
{
"key": "engaged_session_event",
"value":
{
"double_value": null, "float_value": null, "int_value": 1, "string_value": null
}
},
{
"key": "theme",
"value":
{
"double_value": null, "float_value": null, "int_value": null, "string_value": "lightMode"
}
}
]
I would like my output table to have keys as column names and value as rows. Kindly note each key has multiple values, and the right value is the one which is not blank/null.

I tried this in snowpark but got an error :
100357 (P0000): Python Interpreter Error:
raise error_class(
snowflake.snowpark.exceptions.SnowparkSQLException: (1304): 01b0d218-0001-087c-0000-24952743d6f2: 001007 (22023): SQL compilation error:
invalid type [VARCHAR(16777216)] for parameter '1'
import snowflake.snowpark as snowpark
def main(session: snowpark.Session):
# Your code goes here, inside the "main" handler.
tableName = 'GA4_EVENT_DETAILS'
df = session.table(tableName).filter((col("app_info_id") == 'au.com.caltex.flagship')& (col("event_name") == 'screen_view')& (col("event_date") == '20230602')&(col("EVENT_PARAMS_FIREBASE_SCREEN_CLASS")=='View'))
# Print a sample of the dataframe to standard output.
df.show()
df = df.join_table_function("flatten", df["EVENT_PARAMS_JSON"]).drop(["SEQ", "PATH", "INDEX", "THIS"])
to show this is mostly a SQL problem, lets get a CTE with a raw input string being the same as you note, and then PARSE_JSON it, like Mat suggests, and see this is the first level problem:
so here is a CTE that has the string, and JSON as VARAINT (vis PARSE_JSON)
now if we try flatten the raw string text:
which matches your error:
so instead using the VARAINT type, via PARSE_JSON which is the expected input to FLATTEN:
success:
so now we have data we can work with, if we pull the key and value from value:
we get:
so now we want to pull the four values types out:
so now we want to COALESCE or NVL those values to keep the single non-null value:
gives:
so this is all great, now as you state, you want the row keys, to be the column headers, this is done via PIVOT where you have to know the names of the keys/column, via a manual pivot, where you have to know the names of the keys, or dynamically, where now your results can change randomly depending on the data..
I will show the manual pivot method, because it's the one I like.
gives:
The selecting of the SEQ from the FLATTEN allows stitching related rows back together, and thus why it is used in the GROUP BY.
Now you will need to rebuild this SQL into your python instructions to get snowflake to turn it back into the equivalent SQL, so you results "just work".