Estimation of data volume

321 Views Asked by At

I have a Cassandra cluster with 3 nodes which has data from 3 applications. Now, we are planning to add 3 news applications that will increase the workload on the cluster, I want to know the different steps to know the future projection like, if we will add another node etc ... Is it possible to use Cassandra-stress to do that ? If yes, what elements I will look for ?

Thank you for your advice.

2

There are 2 best solutions below

0
On

For a 3 node cluster, if you are adding 3 more applications, along with current 3 applications, make sure that the cluster will be able to take the load. You should know the volume of reads and writes at peak time of each application. Based on reads and writes benchmark the cluster with Cassandra-Stress tool. I would recommend using different cluster for the new applications.

0
On

The cassandra-stress tool can, indeed, be used to model your expected applications, so that you can write data and see how your cluster scales. You should - for what should be obvious reasons - run against a similarly sized cluster that is similar to your hardware, but not on your live production cluster (cassandra-stress WILL increase throughput until the cluster fails, that's the point of the stress utility). You could also write a test that inserts data matching your applications into the database slowly, and execute nodetool flush to force that data into the sstables, and then calculate the change in load to determine how much bytes-per-application you should expect, and use that in traditional capacity estimation calculations.