I try to do the test for understanding how C* takes network issues with fix delay on every packet. Java application with Datastax Java driver very simple multidc C* cluster and tc tool.

There DCA and DCB. Each of them has only node of C* in it. I starts the test with only 200 requests (insert ... if not exist) to C* in second, but each of them at first tries to be executed with SERIAL consistency and if it fails second attempt is going with LOCAL_SERIAL. Timeouts for SERIAL and LOCAL_SERIAL are 400 ms and 200 ms accordinly.

I use tc tool for modelling network issues. As already said I add fix delay 50 ms for all requests from node of DCA to node of DCB.

After this single manual requests are still working. But if I put load test with 200 rps on DCA I get a lot of NoNodeAvailableException in my logs even on LOCAL_SERIAL tries.

I read about defaults for Datastax Java driver. It seems they are ok for this test. I checked that contact points in application in DCA consist only of one node of DCA.

Why do I get this behavior with these quite simple conditions? Any thought?

Also my colleagues got some these exceptions even on 80 rps with no network issues. But I don't know their setup.

1

There are 1 best solutions below

1
Aaron On

Not sure how big the write payloads are, but that can absolutely affect this. Otherwise, the tricky part with lightweight transactions in Cassandra is that each one is doing (I think) something like 4 round-trips between the coordinator and target nodes. So that's going to greatly limit the throughput in and of itself.

Also, multi-DC communications are never easy. I would recommend writing to a single data center (using LOCAL_QUORUM), and relying on replication to sync the replicas.

Are the nodes being marked as "down" at all? If so, one thing you could try, would be to have a look at the phi_convict_threshold on each node. Phi Convict is essentially a non-linear representation of how long node communication can wait before it reports a failure. I think it defaults to 8, but for most multi-DC cloud deployments that I've done, we've had to bump that up to 12.