How to manage huge number of records using lambda

828 Views Asked by At

I have 20K records in Excel on S3. My design looks like below to process these records. SQSWriterLambda -> SQS -> SQSReaderLambda -> Server. SQSWriterLambda would read excel file and send messages to SQS, 1 message for each record in excel so there will be 20k sqs messages. SQSReaderLambda would get triggered based on messages receiving in SQS. It will send content of message to final server for processing. But my server can process only 5k messages in 24 hours. So I am looking for a solution of handling remaining 15k records somehow. I am going to put excel sheet once on S3 and I want lambda to process records(5k per 24 hours) within any number of days.

Visibility timeout has 12 hours as max value. My 1st lambda can put 20k messages on sqs. But 2nd lambda would fail after processing 5k records.

1

There are 1 best solutions below

1
Marcin On

If you hooked up lambda to sqs it will try to submit all 20K messages to the second lambda for processing. I think decoupling SQS with the second lambda should work.

You could consider the following approach:

  1. Replace second lambda function with a "free-standing one" (not connected to SQS). The second function instead would itself query the SQS in an iterative manner. It would just query 5K of messages from it.

  2. Setup CloudWatch Events rule to automatically trigger your function once a day. This way you would just process 5K messages per day.

Note that default retention limit of a message in SQS is 4 days which is just enough for 20K of messages (5K x 4 days). But you can increase it up to 14 days if needed.