I am working on some github data and would like to pull the first commit email ([email protected]) in the data below. All of the below data is stored as a struct column.
"struct<action:string,push_id:bigint,size:int,distinct_size:int,ref:string,head:string,before:string,commits:array<struct<sha:str"
{
"action": null,
"repository_id": null,
"ref": "refs/heads/repos",
"commits": [
{
"sha": "7b6bbe03715bb65",
"author": {
"email": "[email protected]",
"name": "Victor AAA"
},
"message": "xcompmgr : update to 1.1.9",
"distinct": true,
"url": "https://api.github.com"
},
{
"sha": "cb81117197346df2385511d5",
"author": {
"email": "noreply.github.com",
"name": "Luis"
},
"message": "Merge pull request",
"distinct": true,
"url": "https://api.github.com"
}
] }
I figured something like the below query would at least get me started but I keep getting an Invalid access to elements of ARRAY error.
select payload.action, p.email from ghdata g, g.payload.commits.author p
Any help is appreciated
First off I'm assuming that by "struct" you mean "super" in Redshift terms and that the super column is named "payload".
What you have done in your FROM clause is to join (unnesting) the table with itself using all the indexes of the array "commits". This removes the information you desire - what information is first in the array. You can simply index the first element of the array: