Read incrementally from Kafka with AWS Glue

40 Views Asked by user3281975 At 31 January 2024 at 14:18

I have to stream data from Postgres DB to the AWS S3 bucket with Kafka. I have to stream about 60 tables.

I have three S3 Buckets in my Data Lake:

RawBucket - raw data. Tables and columns are equal as in Postgres. This data is deleted after eg. 10 days (retention policy)
ArchiveBucket - data from RawBucet (no transformed data and no retention policy)
CuratedBucket - Data from RawBucket in Parquet file format. 80% of data keeps structure as it is in RawBucket, and 20% of data is transformed

Is it able to read data from Kafka with AWS Glue and:

move All data to ArchiveBucket
move 80% of data into CuratedBucket in its source structure in parquet format
move 20% of data into CuratedBucket in new transformed structure in parquet format
remove processed data from Kafka topics
run AWS Glue Job once on a daily basis and incrementally consume Kafka
every single record in Kafka will have the property "uploadedAt" that tells what is the date when data is uploaded to Kafka. According to the date, record will be saved into a specific subfolder (hive style partitioning, eg: data from table "order" will be saved in the curated bucket in folder "order" and partitioned by date. So final targe is S3:/curatedBucket/order/year=2024/month=1/day=15/"
would like to go without RawBucket if possible

Not sure about quality of this solution, is it possible and good solution? If not, then would some be so kind to write some advice (as much as possible details) about design and how to achieve it?

What about the solution to stream Data from Kafka to RawBucket and process data incrementally using AWS Glue using job bookmarks?

Original Q&A

Read incrementally from Kafka with AWS Glue

There are 0 best solutions below

Related Questions in AMAZON-WEB-SERVICES

Related Questions in AMAZON-S3

Related Questions in APACHE-KAFKA

Related Questions in AWS-GLUE

Related Questions in DATA-LAKE

Trending Questions

Popular # Hahtags

Popular Questions