Load Hudi formatted data into a table created using Athena

124 Views Asked by At

I have created a dataset in S3 using Spark in Hudi format.

I want to create a table using Athena and load all the partitions of that dataset in this new table.

Though I created a external table with input format as HUDI

STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 

But MSCK REAPAIR is not supported to load the data.

1

There are 1 best solutions below

0
Istvan On

You can create a table like this:

CREATE EXTERNAL TABLE `partition_mor`(
  `_hoodie_commit_time` string, 
  `_hoodie_commit_seqno` string, 
  `_hoodie_record_key` string, 
  `_hoodie_partition_path` string, 
  `_hoodie_file_name` string, 
  `event_id` string, 
  `event_time` string, 
  `event_name` string, 
  `event_guests` int)
PARTITIONED BY ( 
  `event_type` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://bucket/folder/partition_mor/'
ALTER TABLE partition_mor ADD
  PARTITION (event_type = 'one') LOCATION 's3://bucket/folder/partition_mor/one/'
  PARTITION (event_type = 'two') LOCATION 's3://bucket/folder/partition_mor/two/'

You can find more details: https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html