Glue crawler only doing top level of DynamoDb Export

428 Views Asked by At

I'm trying to query a dynamodb export using AWS Glue and Athena. I set up a glue crawler to create tables from the exported file, but the output table of interest "data" has only one column "item". Item is a struct which has an assortment of nested files such that the table definition looks like this:

root
|-- Item: struct
|    |-- pk: struct
|    |    |-- S: string
|    |-- sk: struct
|    |    |-- S: string
|    |-- createdAt: struct
|    |    |-- N: string
|    |-- event: struct
|    |    |-- S: string
|    |-- data: struct
|    |    |-- S: string
|    |-- resultId: struct
|    |    |-- NULL: boolean
|    |    |-- S: string
|    |-- updatedAt: struct
|    |    |-- N: string
|    |-- answers: struct
|    |    |-- S: string
|    |-- result: struct
|    |    |-- S: string
|    |-- interactions: struct
|    |    |-- S: string

What do I need to do to be able to use Athena to query the table so that I get back a response with columns: pk, sk, createdAt etc. rather than just a row with everything packed into one struct?

I've tried applying a transform in a Glue ETL script,using function that looks like:

def transform(rec):
    rec["pk"] = rec["Item"]["pk"]["S"]
    del rec["Item"]["pk"]
    ...

This appears to work for the first few keys (which I think are in every row, but when I hit columns that are not in every row, it begins to fail...

1

There are 1 best solutions below

7
Leeroy Hannigan On

For that you need to use a JSON classifier.

https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json

The classifier should look something similar to this: $[*]