We have 3 nodes Cassandra cluster.
We have an application that uses a keyspace that creates a hightload on disks, on read. The problem has a cumulative effect. The more days we interact with the keyspace, the more disk reading grows. : hightload read
Reading goes up to > 700 MB/s. Then the storage (SAN) begins to degrade, and then the Сassandra cluster also degrades.
UPD 25.10.2021: "I wrote it a little wrong, through the SAN space is allocated to a virtual machine, like a normal drive"
The only thing that helps is clearing the keyspace.
Output command "tpstats" and "cfstats"
[cassandra-01 ~]$ nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 1 1 1837888055 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 6789640 0 0
MutationStage 0 0 870873552 0 0
MemtableReclaimMemory 0 0 7402 0 0
PendingRangeCalculator 0 0 9 0 0
GossipStage 0 0 18939072 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 3 0 0
RequestResponseStage 0 0 1307861786 0 0
Native-Transport-Requests 0 0 2981687196 0 0
ReadRepairStage 0 0 346448 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 168 0 0
MemtablePostFlush 0 0 8193 0 0
PerDiskMemtableFlushWriter_0 0 0 7402 0 0
ValidationExecutor 0 0 21 0 0
Sampler 0 0 10988 0 0
MemtableFlushWriter 0 0 7402 0 0
InternalResponseStage 0 0 3404 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 71 0 0
CacheCleanupExecutor 0 0 0 0 0
Message type Dropped
READ 7
RANGE_SLICE 0
_TRACE 0
HINT 0
MUTATION 5
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE 0
READ_REPAIR 0
[cassandra-01 ~]$ nodetool cfstats box_messages -H
Total number of tables: 73
----------------
Keyspace : box_messages
Read Count: 48847567
Read Latency: 0.055540737801741485 ms
Write Count: 69461300
Write Latency: 0.010656743870327794 ms
Pending Flushes: 0
Table: messages
SSTable count: 6
Space used (live): 3.84 GiB
Space used (total): 3.84 GiB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 10.3 MiB
SSTable Compression Ratio: 0.23265712113582082
Number of partitions (estimate): 4156030
Memtable cell count: 929912
Memtable data size: 245.04 MiB
Memtable off heap memory used: 0 bytes
Memtable switch count: 92
Local read count: 20511450
Local read latency: 0.106 ms
Local write count: 52111294
Local write latency: 0.013 ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 57318
Bloom filter false ratio: 0.00841
Bloom filter space used: 6.56 MiB
Bloom filter off heap memory used: 6.56 MiB
Index summary off heap memory used: 1.78 MiB
Compression metadata off heap memory used: 1.95 MiB
Compacted partition minimum bytes: 73
Compacted partition maximum bytes: 17084
Compacted partition mean bytes: 3287
Average live cells per slice (last five minutes): 2.0796939751354797
Maximum live cells per slice (last five minutes): 10
Average tombstones per slice (last five minutes): 1.1939751354797576
Maximum tombstones per slice (last five minutes): 2
Dropped Mutations: 5 bytes
(I'm unable to comment and hence posting it as an answer)
As folks mentioned SAN is not going to be the best suite here and one could read through the list of anti-patterns documented here which could also apply to OSS C*.