Column types:
D describe some_table;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ id │ BIGINT │ YES │ │ │ │
│ data │ JSON │ YES │ │ │ │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
Sample data:
D select id, data from some_table;
┌────────┬──────────────────────────────────────────────────────────────────────────────────┐
│ id │ data │
│ int64 │ json │
├────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ 1 │ {"type":"abc","purpose":"ad","ts":"...","userId":"","context":{"ip":"x.x.x.x",...│
│ 2 │ {"type":"abc","purpose":"search","ts":"...","userId":"ABCD1234","context":{"ip...│
│ 3 │ {"type":"defghi","purpose":null,"ts":"...","userId":"","context":{"ip":"x.x.x....│
...
└───────────────────────────────────────────────────────────────────────────────────────────┘
Expectation:
┌──────────┬─────────┬─────┬──────────┬─────────────┬──────────────────┐
│ type │ purpose │ ts │ userId │ context.ip │ context.sth.else │
├──────────┼─────────┼─────┼──────────┼─────────────┼──────────────────┤
│ abc │ ad │ ... │ │ │ │
│ abc │ search │ ... │ │ │ │
│ defghi │ null │ ... │ ABCD1234 │ │ │
└──────────┴─────────┴─────┴──────────┴─────────────┴──────────────────┘
I tried using unnest but it works only with structs / lists:
D select unnest(data) from some_table limit 5;
Error: Binder Error: UNNEST() can only be applied to lists, structs and NULL
LINE 1: select unnest(data) from some_table limit 5;
^
So, I then tried unnest with from_json based on Unnest JSON Array into Rows (pseudo-json_each), but that didn't work either:
D select unnest(from_json(data, '"JSON"')) from some_table limit 5;
Error: Binder Error: UNNEST() can only be applied to lists, structs and NULL
LINE 1: select unnest(from_json(data, '"JSON"')) from ...
^
I've tried searching for this solution and came across these:
- https://duckdb.org/2023/03/03/json.html
- https://www.linkedin.com/pulse/duckdb-useful-parsing-json-yes-definitely-michael-boyle-wo1dc
- https://motherduck.com/blog/analyze-json-data-using-sql/
But they all seem to assume the data is stored in JSON files, whereas I've read this table from a CSV file. I want to flatten an existing JSON column.
Alternatively, if I could convert this JSON column into a STRUCT column, that would allow me to use the unnest function. But I couldn't find any online references for that, either.
TL;DR - I'm looking for DuckDB equivalent of pandas.json_normalize.
As you say, it seems that there is no way to get
read_json()behaviour (schema inference) without writing the JSON column out to a tempfile.Converting to a struct appears to require specifying a schema.
Example setup using Python API
Tempfile
Positional join
However, I can't any way to retain the "path" in a recursive unnest - it just uses the final "key" as the column name.
It seems a feature request for this is needed.