Cassandra dynamic snitch and dynamic_snitch_reset_interval_in_ms parameter

33 Views Asked by At

I was looking into Cassandra dynamic snitch. Suppose there are 3 replica nodes (N1, N2 and N3) for a particular keyspace. Suppose, for now the snitch considers only latency for score calculation. Let for a particular instance,

N1 = 1ms,

N2=2ms,

N3=3ms (due to some network error).

Let for a session, if read consistency = 2, coordinator will choose N1 for full data request and N2 for digest request.

  1. Will the coordinator completely ignore N3 for this session or perform digest request and read repair on it too in background (asynchronously)?

Also I have problem understanding the need of dynamic_snitch_reset_interval_in_ms in the first place. Since the latencies of every node must have been collected and score calculated every dynamic_snitch_update_interval_in_ms interval. If network error is resolved and node N3’s latency decreases to 0.5ms, the snitch changes the preferred node itself. Is it only to cater that 10% latency improvement (since dynamic snitch changes the preferred node if current node performs 10% less than highest performing node)?

  1. Why is dynamic_snitch_reset_interval_in_ms required ?

I am assuming that score of every node is calculated even if coordinator ignores N3 for current session (which I am not sure) because another session might have RC of 3 in which case coordinator must consider node N3 too.

Am I missing something ? This is really confusing to me.

Reference: https://www.datastax.com/blog/dynamic-snitching-cassandra-past-present-and-future#:~:text=First%20off%2C%20if%20we%20don%27t%20read%20from%20a%20host%20because%20we%20determine%20it%27s%20no%20longer%20performing%20sufficiently%2C%20how%20can%20we%20know%20that%20it%20has%20recovered%3F

1

There are 1 best solutions below

0
Madhavan On

Not very sure about the above, but I've almost always leverage GossipingPropertyFileSnitch these days over any other options. The GossipingPropertyFileSnitch represents a significant advancement in how Cassandra handles network topology, offering dynamic, efficient, and fault-tolerant data placement and request routing. Its integration with the gossip protocol and support for dynamic updates without cluster restarts make it a preferred choice for production Cassandra deployments, especially those spanning multiple datacenters.

Points to remember:

  • By design, the GossipingPropertyFileSnitch falls back on the PropertyFileSnitch's apache cassandra-topology.properties as a means to allow clusters to be migrated to GossipingPropertyFileSnitch.
  • If the cluster is already on GossipingPropertyFileSnitch, ensure that apache cassandra-topology.properties has been removed or does not exist even if there are no issues with the nodes to ensure the cluster does not encounter problems in the future. More at CASSANDRA-11508
  • FYI, CASSANDRA-10745 is a proposal to deprecate and fully remove the PropertyFileSnitch in future releases.