Why Cassandra repair --partitioner-range requres to tun on each node of each datacenter in a cluster?

Question

Why Cassandra repair --partitioner-range requres to tun on each node of each datacenter in a cluster?

70 Views Asked by slalomoon At 17 May 2023 at 09:14

Why need to run nodetool repair -pr on each node of each DC? It is not needed when repair is run without -pr. Why it is different? As I understand difference is only in a number of token ranges - with -pr only "primary" ranges and without -pr also ranges belonging to other nodes that replicated on this node. How it is affecting repair propagation to other DCs? All DCs have the same token space(token ring) and if we do repair on all nodes of one DC then entire token space(token ring) will be covered.

What I'm expecting is that nodetool repair -pr enough to run on a single datacenter of a cluster. Apache documentation has no requirement to run nodetool repair -pr on each node of each datacenter https://cassandra.apache.org/doc/3.11/cassandra/operating/repair.html "The -pr flag will only repair the "primary" ranges on a node, so you can repair your entire cluster by running nodetool repair -pr on each node in a single datacenter."
According to the following articles when nodetool repair is run without -pr, then it is need to be done only on one datacenter in cluster. But on each node of each datacenter when run with -pr.

https://www.datastax.com/blog/repair-cassandra "This is very important, so I’m going to say it again, if you are using “nodetool repair -pr” you must run it on EVERY node in EVERY data center, no skipping allowed...."

"If you have multiple data centers, by default when running repair all nodes in all data centers will sync with each other on the range being repaired. So for an RF of {DC1:3, DC2:3} for a given token range there will be 6 nodes all comparing data with each other and streaming any differences back and forth. If you have 4 data centers {DC1:3, DC2:3, DC3:3, DC4:3} you will have 12 nodes all comparing with each other and streaming data to each other at the same time for each token range [2]. This makes using “-pr” even more important, as if you don’t use it you repair a given token range 3+3+3+3+=12 times for the 4 DC case if you ran without using “-pr” on every node in the cluster."

and

https://www.datastax.com/blog/repair-cassandra "Note: If you use this option, you must run nodetool repair -pr on every node in the cluster to repair all data. Otherwise, some ranges of data will not be repaired..."

"Consider carefully before using nodetool repair across datacenters, instead of within a local datacenter. When you run repair locally on a node using -local or --in-local-dc, the command runs only on nodes within the same datacenter as the node that runs it. Otherwise, the command runs cluster-wide repair processes on all nodes that contain replicas, even those in different datacenters. For example, if you start nodetool repair over two datacenters, DC1 and DC2, each with a replication factor of 3, repairmust build Merkle tables for 6 nodes..."

Even more documnetation inconsistency in the following: "The nodetool repair tool does not support the use of -local with the -pr option unless the datacenter's nodes have all the data for all ranges." That is assumed that -pr is also running cluster wide as such without -pr.

Original Q&A

There are 1 best solutions below

**stevenlacerda** · Answer 1 · 2023-05-17T12:58:12.107000

Current behavior, when -pr is specified, is to treat a multi-DC set up as a single ring. Because TokenMetadata.getPredecessor(Token) doesn't take into account a DC for a token and just searches for a predecessor across all tokens from all DCs.

So let's say we have this token range from 0 to 100 for simplicity.

DC  Node    Token   Owned
DC1 Node1   0       33%
DC1 Node2   33      33%
DC1 Node3   66      33%
DC2 Node1   25      25%
DC2 Node2   50      40%
DC2 Node3   90      35%

You would expect "nodetool repair -pr" on DC1 node1 to be the same as nodetool repair -st 0 -et 33, but it is actually -st 0 -et 25.

And repair -pr on node 2 would be the same as -st 33 -et 50

Node 3 would be -st 66 -et 90

So we skipped, 25-33, 50-66, and 90-0

-pr isn't really primary range, it is partial range.

Why Cassandra repair --partitioner-range requres to tun on each node of each datacenter in a cluster?

There are 1 best solutions below

Related Questions in CASSANDRA

Related Questions in REPAIR

Trending Questions

Popular # Hahtags

Popular Questions