we just added new second DC to our cassandra cluster with 7 nodes (per 5 jbods SSD), and after replication of the new DC we got periodical stucked compactions of Opscenter.rollup_state table. When this happens node goes to down state for other but it stays alive itself, nodetool drain also stucked on the node and only reboot of the node helps in this situation. the log below already after restart of the node. below both nodes are stucked in this state.
DEBUG [CompactionExecutor:14] 2019-09-03 17:03:44,456 CompactionTask.java:154 - Compacting (a43c8d71-ce53-11e9-972d-59ed5390f0df) [/cass-db1/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-581-big-Data.db:level=0, /cass-db1/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-579-big-Data.db:level=0, ]
other node
DEBUG [CompactionExecutor:14] 2019-09-03 20:38:22,272 CompactionTask.java:154 - Compacting (a00354f0-ce71-11e9-91a4-3731a2137ea5) [/cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-610-big-Data.db:level=0, /cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-606-big-Data.db:level=0, ]
WARN [CompactionExecutor:14] 2019-09-03 20:38:22,273 LeveledCompactionStrategy.java:273 - Live sstable /cass-db2/data/OpsCenter/rollup_state-43e776914d2911e79ab41dbbeab1d831/mc-606-big-Data.db from level 0 is not on corresponding level in the leveled manifest. This is not a problem per se, but may indicate an orphaned sstable due to a failed compaction not cleaned up properly.
what is the way to solve this problem.