Getting rpc_timeout when counting data in Cassandra

50 Views Asked by At

i have 3 node cassandra 2.0.9 on production and face the issue when count or any specific query with clqsh is always get rpc_timeout the weird is this happen only on cassandra 1, other node with same configuration is fine

[cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> use xdata;
cqlsh:xdata> select count(*) from blobstore limit 100;
Request did not complete within rpc_timeout.

here the log from system.log when execute the query

INFO [MemoryMeter:1] 2022-08-03 10:40:10,910 Memtable.java (line 481) CFS(Keyspace='system', ColumnFamily='sstable_activity') liveRatio is 14.607407883739976 (just-counted was 14.607407407407408).  calculation took 2ms for 54 cells
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,061 MessagingService.java (line 857) 1 REQUEST_RESPONSE messages dropped in last 5000ms
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,061 StatusLogger.java (line 55) Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,062 StatusLogger.java (line 70) MutationStage                     0         0           8726         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,062 StatusLogger.java (line 70) RequestResponseStage              0         0         193404         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,062 StatusLogger.java (line 70) ReadRepairStage                   0         0              0         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,062 StatusLogger.java (line 70) ReadStage                         0         0         295316         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,062 StatusLogger.java (line 70) ReplicateOnWriteStage             0         0              0         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,063 StatusLogger.java (line 70) MiscStage                         0         0           2582         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,063 StatusLogger.java (line 70) AntiEntropySessions               0         0           1028         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,063 StatusLogger.java (line 70) HintedHandoff                     0         0            112         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,063 StatusLogger.java (line 70) FlushWriter                       0         0             39         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,063 StatusLogger.java (line 70) MemoryMeter                       0         0             50         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,063 StatusLogger.java (line 70) GossipStage                       0         0         150208         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,063 StatusLogger.java (line 70) CacheCleanupExecutor              0         0              0         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,064 StatusLogger.java (line 70) InternalResponseStage             0         0           4112         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,064 StatusLogger.java (line 70) CompactionExecutor                0         0            271         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,064 StatusLogger.java (line 70) ValidationExecutor                0         0           2582         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,064 StatusLogger.java (line 70) MigrationStage                    0         0              2         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,064 StatusLogger.java (line 70) commitlog_archiver                0         0              0         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,064 StatusLogger.java (line 70) AntiEntropyStage                  0         0          11332         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,064 StatusLogger.java (line 70) PendingRangeCalculator            0         0              3         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 70) MemtablePostFlusher               0         0           6062         0                 0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 79) CompactionManager                 0         0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 81) Commitlog                       n/a         0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 93) MessagingService                n/a       0/0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 103) Cache Type                     Size                 Capacity               KeysToSave
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 105) KeyCache                      13808                104857600                      all
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 111) RowCache                          0                        0                      all
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 118) ColumnFamily                Memtable ops,data
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 121) system.compaction_history              9,3184
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 121) system.hints                              0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,065 StatusLogger.java (line 121) system.IndexInfo                          0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.schema_columnfamilies                 0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.schema_triggers                    0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.NodeIdInfo                         0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.paxos                              0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.peer_events                        0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.range_xfers                        0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.compactions_in_progress                 0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.peers                              0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.schema_keyspaces                   0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.local                              0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.sstable_activity             639,23664
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.schema_columns                     0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,066 StatusLogger.java (line 121) system.batchlog                           0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,067 StatusLogger.java (line 121) xdata.blobstore                  127,54400
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,067 StatusLogger.java (line 121) xdata.document                         0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,067 StatusLogger.java (line 121) xdata.blobstoremeta             252,121960
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,067 StatusLogger.java (line 121) system_traces.sessions                    0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:40:21,067 StatusLogger.java (line 121) system_traces.events                      0,0
     INFO [ScheduledTasks:1] 2022-08-03 10:42:13,051 GCInspector.java (line 116) GC for ParNew: 261 ms for 1 collections, 2073077016 used; max is 8482586624
2

There are 2 best solutions below

1
Erick Ramirez On

A CQL SELECT COUNT() is not a good thing to do in a distributed architecture. When you do an unbounded COUNT(), Cassandra has to do a full table scan to read every single record on every single node.

Counting rows in tables is very expensive as I've explained in detail in Why COUNT() is bad in Cassandra.

You should consider using other software such as Spark, Solr, or tools such as the DataStax Bulk Loader (DSBulk) which are able to count data in Cassandra tables efficiently since they use algorithms optimised for the task. Details in the DBA Stack Exchange post I linked above. Cheers!

0
Aaron On

the question is why this happen only in 1 node with same configuration

Same config, sure. But same compute resources? I assume yes, but maybe it's not the same?

Also, what does the keyspace definition look like? If you're running with number of nodes > replication factor, it's possible that failing node is being used as a "coordinator node." In that case, the other node might not need the extra network hop, because it owns the data.

You could verify this by running TRACING ON before the CQL statement. The report would show you the network hops through the cluster.