What is the right approach for batch deletions with the Cassandra C# driver?

182 Views Asked by At

I am quite new to Cassandra database. I have a question related to use of cassandra.

Table structure looks like below :-

Table Name :- Product Details.

ProductFamily Text,

AccessGroup Text,

ProductDetails Map

((ProductFamily), AccessGroup) PRIMARY Key

Data Relation :-

For 1 Product family we have multiple Access Groups and each access group has product details in Map . It is quite possible 1 product detail is present in all the access groups or some of the access groups.

Scenario 1 : -

  1. We receive a delete event with ProductId and product family only.

Our implementation :-

  1. Fetch all access group of the product family from the database.

  2. For each access group, hit database to get the map, then we are checking whether it has specific productid as map key.

  3. If yes, then hold that accessgroup -> productid (key,value) pair in memory.

  4. In the end, prepare batch statement to delete all the product ids for the access group because our partition key is same.

Note - Max. we have 15-20 items in a map and 8-10 access groups with a product family.

.

Questions : -

  1. Could you please let me know whether am I following right approach for batch deletion ?

  2. If we receive thousands of such events in a day whether this approach is performant ?

Thanks in advance.

1

There are 1 best solutions below

4
João Reis On

In general we don't recommend using Batches if the goal is to improve performance. However, some users have reported performance improvements when all statements within a batch refer to the same partition key (vs sending individual asynchronous requests) so your approach could actually be the one that offers the best performance.

One thing that could hurt performance is the "spiky" nature of that approach. It would probably be better for the Cassandra nodes to do something like this:

  1. Fetch all access group of the product family from the database.

  2. For each access group, hit database to get the map, then we are checking whether it has specific productid as map key.

  3. If yes, then send a DELETE request asynchronously and hold the Task in memory (without awaiting it right away).

  4. In the end, await all the tasks that were held in memory, await Task.WhenAll(tasks).

There is no guarantee that this approach will be better though, performance tests and benchmarks are the only way to determine that.