I have a service that accepts millions of customer requests and processes them asynchronously. All requests are added to a ddb table for processing and removed from table on completion. The system can accumulate a few billions of requests. On a periodic basis i need to monitor the total number of requests in the table, how many of them are older than 2 hours and what is the age of the oldest request. Other than scanning the whole ddb table periodically which is very expensive and not scalable what other technologies or solutions can i use to answer above questions?
Also is there a name for the above async processing pattern ?
I have tried DDB scanning but as the number of requests increase the system is not scaling well.
I think if the service accumulates billions of requests (per day?), it can be very computationally expensive to keep adding and removing records from table. If your only goal is to keep track of the number of asynchronous requests that are not yet completed, I would recommending using distributed messaging queue like AWS SQS or Apache Kafka to manage the requests in a queue-like data structure which will then received as messages by server nodes to handle those requests. For AWS SQS, you can use it in conjunction with AWS CloudWatch , which allow you to get relevant metrics from the queue. For example:
ApproximateNumberOfMessagesVisiblemetric.ApproximateAgeOfOldestMessagemetricTo see how many of the requests are older than 2 hours, I think you can have the server nodes that handle forwarding requests to the queue to publish a CloudWatch log of the request id along with the receiving timestamp and the server nodes that handle the request from the queue to publish another CloudWatch log that includes request id. That way, you can get the number of open requests that are older than 2 hours by querying the request ids that are a older than 2 hours minus the request ids that are logged by request handler
I don't think there's a specific name for async processing pattern but usually when talking about asynchronous processing, message queue pattern or publish–subscribe pattern usually come into mind