Include filename and path in external location when creating a table in Trino

355 Views Asked by At

I have a S3 bucket named test-bucket with the following structure

- test-bucket 
  - surveys
    - abc.json
    - def.json
    [...]

Now I create a table in Trino like this

create table hive.qual2.surveys 
(
    id VARCHAR,
    name VARCHAR
)
with (
    format = 'JSON',
    external_location = 's3://test-bucket/surveys'
);

select data from this table works fine. However, now I also need to know where the information is coming from. So I need to enter a column with e.g a value abc.json if the data is coming from this file.

Any clue how I can achieve this?

1

There are 1 best solutions below

0
jan-seins On BEST ANSWER

As described in the docs for the hive connector provided by trino, I could just use the $path column to see the file where the information is coming from.

to extract the file name without an extension i simply use this:

split(element_at(split("$path", '/'),-1),'.')[1] as survey_id

so the whole query looks like this after i created the table as described above.

select
  *,
  split(element_at(split("$path", '/'),-1),'.')[1] as survey_id
from hive.qual2.surveys;

including $path in the create table definition does not make sense, since it already exists as a hidden column.