How do I pull the first item in an array that's inside a stuct column in a Redshift nested table

383 Views Asked by At

I am working on some github data and would like to pull the first commit email ([email protected]) in the data below. All of the below data is stored as a struct column.

"struct<action:string,push_id:bigint,size:int,distinct_size:int,ref:string,head:string,before:string,commits:array<struct<sha:str"

{
    "action": null,
    "repository_id": null,
    "ref": "refs/heads/repos",
    "commits": [
       {
          "sha": "7b6bbe03715bb65",
          "author": {
             "email": "[email protected]",
             "name": "Victor AAA"
          },
          "message": "xcompmgr : update to 1.1.9",
          "distinct": true,
          "url": "https://api.github.com"
       },
       {
          "sha": "cb81117197346df2385511d5",
          "author": {
             "email": "noreply.github.com",
             "name": "Luis"
          },
          "message": "Merge pull request",
          "distinct": true,
          "url": "https://api.github.com"
       }
    ]  }

I figured something like the below query would at least get me started but I keep getting an Invalid access to elements of ARRAY error.

select payload.action, p.email from ghdata g, g.payload.commits.author p

Any help is appreciated

1

There are 1 best solutions below

0
Bill Weiner On

First off I'm assuming that by "struct" you mean "super" in Redshift terms and that the super column is named "payload".

What you have done in your FROM clause is to join (unnesting) the table with itself using all the indexes of the array "commits". This removes the information you desire - what information is first in the array. You can simply index the first element of the array:

select payload.action, payload.commits[0].author.email from ghdata;