I'm querying a JSON file in S3 with multiple columns:
SELECT a, b, c FROM json.`s3://my-bucket/file.json.gz`
And the file looks like this:
{a: {}, b: 0, c: 1}
{a: {}, b: 1, c: 2}
{a: {}, b: 2, c: 3}
The query above fails and returns
UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`b`, `c`]
And when I perform
SELECT * FROM json.`s3://my-bucket/file.json.gz`
I get only the columns b and c.
Is there a way where I can also get column a, and also see that it is an empty JSON?
Can you use Python or Scala syntax?
You need to impose schema on the json file during reading the json files, and as far as I know it's not possible through SQL queries alone.
The solution using Python syntax would look like this: