Could you please explain what i do wrong? I'm running 7 nodes in one DC of Cassandra 4.0.5 As I'm trying to maintain those nodes I'm running on daily basis on every node:
nodetool repair --full -pr <keyspace>
As I understand the "Bytes unrepaired" value should be decreasing, not increasing, but it's not work for me. Instead of decreasing I got this:
nodetool tablestats <keyspace> | grep -E "Table:|Bytes unrepaired:"
Table: <table1>
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 31.304GiB
Table: <table2>
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 1009.825GiB
Table: <table3>
Percent repaired: 100.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 0.000KiB
Table: <table4>
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 2.227GiB
Table: <table5>
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 250.537GiB
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 4.364MiB
Table: <table6>
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 1.227MiB
Table: <table7>
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 2.031MiB
The term "unrepaired" may seem ambiguous given what repairs are meant to do, but what you see is expected.
The distinction between repaired and unrepaired bytes lies in the metadata upon which incremental repairs depend.
FULL REPAIR
When you run the repair command you issued (in 4.0 and above) with
--full, Cassandra doesn't mark the data covered by the scope of the repair as "repaired" at the sstable metadata level, keeping the default "unrepaired" state - this makes the distinction betweenfullandincrementalrepairs.INCREMENTAL REPAIR
When you run an incremental repair instead, Cassandra marks all the repaired data as "repaired", also at the sstable level - The goal of this feature is to skip the "repaired" sstables in future repair sessions, minimizing repair overhead.
This effectively creates 2 pools of sstables behind the scenes, where repaired sstables can only be compacted with other repaired sstables, while the same logic applies to their unrepaired counterparts, in order to retain the consistency of repair status metadata.
So that you don't have to be locked in into only full or incremental repairs, Cassandra always keeps the repaired metadata accordingly, depending on which repair strategy you opt for.
Reportedly, previous bugs with incremental repairs were addressed in Cassandra 4.0, and in most cases it's worth running incremental repairs in Cassandra 4 and above, with some periodic full repairs in between. This is specially true if repair runs impact your services/application - incremental repairs can save you a ton of repair time.