AWS Step Function that waits for specific data to arrive in SQS

34 Views Asked by At

I'd like to create a Step Function in AWS that is triggered by SQS events.

Events in SQS are from 2 types: typeA and typeB. A typeA message has some ID and a filed with count of how many typeB messages are expected to arrive. typeB has its own ID and his parent of typeA Id.

It means that there is some relationship between types: typeA is parent, typeB is a child.

When typeA arrived I'd like to start waiting for all its children to arrive and if it does not happen within 1 hour, return some specific status.

When typeB arrived I'd like to start calculating all his siblings and only when all arrived, call some Lambda function to calculate the final status based on additional info in the typeB message.

How do I implement this kind of wait for very specific data to arrive/to be fetched from SQS without using a database? Maybe Step Functions is not the right choice for this architecture?

I tried to create some flow, but can't see how to do this main flow of waiting or transaction on specific data.

1

There are 1 best solutions below

1
John Rotenstein On

I don't think that Amazon SQS is appropriate for your architecture.

When retrieving messages from an Amazon SQS queue, it is not possible to 'selectively' request types of messages. Instead, SQS simply gives you random messages from the queue (well, not totally random, but it is best to think of them this way). Therefore, there is no way of knowing how many typeB messages are in the queue compared to typeA messages.

You could create two separate SQS queues -- one for typeA and one for typeB. However, you still will not be able to know how many typeB messages exist in the queue for a particular typeA parent ID. So, if your system is waiting for messages from multiple 'parents' then you won't know when the queue has the required number of messages for a specific parent.

Instead, your system will need to retrieve the messages from the SQS queue and store them somewhere while awaiting the conditions to finally process them. The 'somewhere' could be a database (eg DynamoDB) or even Amazon S3 (which can be considered to be a NoSQL database too). For example, the messages could be stored in S3 in a directory based on the 'parent' ID. Each time that a message arrives and is stored in S3, the Lambda function could count the number of messages stored in that directory. If the count meets the required total, then it could trigger the final processing.