I have a table to store messages which are failed to process and I am retrying to process messages every 5 minutes through scheduler.
When message gets processed successfully, respective row from table is deleted, so that same message should not get processed again.
To fetch rows from table query is SELECT * FROM <table_name> , due to which we are facing tombstone issues if large number of rows gets deleted.
Table have timestamp as partition key and message_name(TEXT) as clustering key, TTL of 7 days and gc_grace_second of 2 days
As per my requirement, I need to delete records otherwise duplicate record will get processed. Is there any solution to avoid tombstone issues?
Unfortunately, there isn't a quick fix to your problem.
The challenge for you is that you're using Cassandra as a queue and it isn't a good idea because you run exactly into that tombstone hell. I'm sure you've seen this blog post by now that talks queues and queue-like datasets being an anti-pattern for Cassandra.
It is possible to avoid generating lots of tombstones if you model your data differently in buckets with each bucket mapping to a table. When you're done processing all the items in the bucket,
TRUNCATEthe table. This idea came from Ryan Svihla in his blog post Understanding Deletes where he goes through the idea of "partitioning tables". Cheers!