I am trying to setup a data pipeline in AWS hopefully using serverless and hosted service.
However, one of the steps require large amount of ram (120GB) which cannot be broken down into smaller chunks.
Ideally I would also run the steps as containers since the packages requirements are a bit exotic.
So far it seems like neither AWS Glue nor MWAA handles more than 32GB of ram.
The one that does handle it is AWS data pipeline, which is being deprecated.
Am I missing some (hosted) options? Otherwise I know that I can do things like running Flyte on managed k8s.
Regards, Niklas
For such use case where you require a containerized approach and you prefer it to be serverless, you can check out EMR Serverless:
Additionally, you can build your own containers with custom images that contain your specific package requirements.
And a note: Glue can process this file too. G.2X worker type has 32 GB of memory, but it also has 128 GB of disk space, which is utilized by a worker if it needs the space (and in a shuffle operation). You can also add your custom packages per job.