DuckDB: Star expression with nested objects

102 Views Asked by At

With DuckDB, it's possible to unwrap structs like this:

    SELECT
        parent_id,
        top_level_struct.*
    FROM 
        arrow_table AS root

I'm having an issue when trying to unwrap nested structs (see below for example code):

    SELECT
        parent_id,
        my_list_unnested.my_list.* EXCLUDE (list_struct),  -- not working
        my_list_unnested.my_list.list_struct.* -- not working
    FROM 
        arrow_table AS root,
        UNNEST(root.my_list) AS my_list_unnested        

Code to reproduce the issue

import duckdb
import pyarrow as pa

test_records = [
    {
        "parent_id": 123,
        "top_level_struct": {
            "top_level_hello": "World",
            "top_level_foo": "bar",
            "top_level_baz": "qux"
        },
        "my_list": [
            {
                "item_id": 123,
                "list_struct": {
                    "list_hello": "World",
                    "list_foo": "bar",
                    "list_baz": "qux"
                }
            }
        ]
    }
]


arrow_table = pa.Table.from_pylist(test_records)

# this is working
WORKING_SQL = """
    SELECT
        parent_id,
        top_level_struct.*
    FROM 
        arrow_table AS root
"""
df = duckdb.sql(WORKING_SQL)

# this is not working
NOT_WORKING_SQL = """
    SELECT
        parent_id,
        my_list_unnested.my_list.* EXCLUDE (list_struct),  -- not working
        my_list_unnested.my_list.list_struct.* -- not working
    FROM 
        arrow_table AS root,
        UNNEST(root.my_list) AS my_list_unnested        
"""

df = duckdb.sql(NOT_WORKING_SQL)

# Gives
# duckdb.duckdb.ParserException: Parser Error: syntax error at or near "*"

What I try to achieve

I'm trying to flatten the above records into a structure shown below, and I need to use DuckDB because of the the actual case I'm working on:

desired_structure = [
    {
        "parent_id": 123,
        "top_level_hello": "World",
        "top_level_foo": "bar",
        "top_level_baz": "qux",
        "item_id": 123,
        "list_hello": "World",
        "list_foo": "bar",
        "list_baz": "qux"
    }
]

Environment/versions

  • DuckDB:0.9.2
  • Python:3.10.12
  • Ubuntu 22.04
1

There are 1 best solutions below

0
jqurious On BEST ANSWER

The desired structure looks like the result of a Recursive Unnest?

duckdb.sql("""
from arrow_table
select 
   parent_id,
   unnest(top_level_struct),
   unnest(my_list, recursive := true)
""")
┌───────────┬───────────────┬───────────────┬─────────────────┬─────────┬──────────┬──────────┬────────────┐
│ parent_id │ top_level_baz │ top_level_foo │ top_level_hello │ item_id │ list_baz │ list_foo │ list_hello │
│   int64   │    varchar    │    varchar    │     varchar     │  int64  │ varchar  │ varchar  │  varchar   │
├───────────┼───────────────┼───────────────┼─────────────────┼─────────┼──────────┼──────────┼────────────┤
│       123 │ qux           │ bar           │ World           │     123 │ qux      │ bar      │ World      │
└───────────┴───────────────┴───────────────┴─────────────────┴─────────┴──────────┴──────────┴────────────┘