Load Hudi formatted data into a table created using Athena

124 Views Asked by ketankk At 03 January 2024 at 06:25

I have created a dataset in S3 using Spark in Hudi format.

I want to create a table using Athena and load all the partitions of that dataset in this new table.

Though I created a external table with input format as HUDI

STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat'

But MSCK REAPAIR is not supported to load the data.

Original Q&A

There are 1 best solutions below

Istvan On 05 January 2024 at 10:33

You can create a table like this:

CREATE EXTERNAL TABLE `partition_mor`(
  `_hoodie_commit_time` string, 
  `_hoodie_commit_seqno` string, 
  `_hoodie_record_key` string, 
  `_hoodie_partition_path` string, 
  `_hoodie_file_name` string, 
  `event_id` string, 
  `event_time` string, 
  `event_name` string, 
  `event_guests` int)
PARTITIONED BY ( 
  `event_type` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://bucket/folder/partition_mor/'

ALTER TABLE partition_mor ADD
  PARTITION (event_type = 'one') LOCATION 's3://bucket/folder/partition_mor/one/'
  PARTITION (event_type = 'two') LOCATION 's3://bucket/folder/partition_mor/two/'

You can find more details: https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html

Load Hudi formatted data into a table created using Athena

There are 1 best solutions below

Related Questions in SQL

Related Questions in AMAZON-ATHENA

Related Questions in APACHE-HUDI

Trending Questions

Popular # Hahtags

Popular Questions