With DuckDB, it's possible to unwrap structs like this:
SELECT
parent_id,
top_level_struct.*
FROM
arrow_table AS root
I'm having an issue when trying to unwrap nested structs (see below for example code):
SELECT
parent_id,
my_list_unnested.my_list.* EXCLUDE (list_struct), -- not working
my_list_unnested.my_list.list_struct.* -- not working
FROM
arrow_table AS root,
UNNEST(root.my_list) AS my_list_unnested
Code to reproduce the issue
import duckdb
import pyarrow as pa
test_records = [
{
"parent_id": 123,
"top_level_struct": {
"top_level_hello": "World",
"top_level_foo": "bar",
"top_level_baz": "qux"
},
"my_list": [
{
"item_id": 123,
"list_struct": {
"list_hello": "World",
"list_foo": "bar",
"list_baz": "qux"
}
}
]
}
]
arrow_table = pa.Table.from_pylist(test_records)
# this is working
WORKING_SQL = """
SELECT
parent_id,
top_level_struct.*
FROM
arrow_table AS root
"""
df = duckdb.sql(WORKING_SQL)
# this is not working
NOT_WORKING_SQL = """
SELECT
parent_id,
my_list_unnested.my_list.* EXCLUDE (list_struct), -- not working
my_list_unnested.my_list.list_struct.* -- not working
FROM
arrow_table AS root,
UNNEST(root.my_list) AS my_list_unnested
"""
df = duckdb.sql(NOT_WORKING_SQL)
# Gives
# duckdb.duckdb.ParserException: Parser Error: syntax error at or near "*"
What I try to achieve
I'm trying to flatten the above records into a structure shown below, and I need to use DuckDB because of the the actual case I'm working on:
desired_structure = [
{
"parent_id": 123,
"top_level_hello": "World",
"top_level_foo": "bar",
"top_level_baz": "qux",
"item_id": 123,
"list_hello": "World",
"list_foo": "bar",
"list_baz": "qux"
}
]
Environment/versions
- DuckDB:0.9.2
- Python:3.10.12
- Ubuntu 22.04
The desired structure looks like the result of a Recursive Unnest?