Can Kinesis Firehose add partition to Athena/Glue after upload file to s3?

42 Views Asked by At

My goal is to be able to load kinesis data (logs and events) into S3 or Redshift for querying. Our Redshift is private so loading directly into Redshift is not possible. As I see it there are a few ways:

  1. Use Redshift streaming ingestion to represent the stream in redshift and build etl around it to load the data into final table.
  2. Use glue connector/spark streaming/flink cluster to load the data into iceberg table.
  3. Load files into s3 via firehose and add the partition to Athena/Glue table, then query the data via redshift spectrum.

Ideally I would want something I don't need to manage, upgrade, write a lot of code, or touch after the initial setup.

For (1) is easy to setup, but I need to write etl for each stream I have.

For (2) I won't have to manage the partitions which is great, but I think the initial setup and maybe the cluster I need to manage will have a lot of overhead.

For (3) sounds the simplest with the least amount of work and things to manage.

What I'm not sure about is, how to add partitions to Athena/Glue table after we upload file to s3? I would like to avoid a process to manage like glue crawler.

Also, if you have another approach to achieve my goal I would be happy to hear it.

0

There are 0 best solutions below