I’m finding below error occasionally telling that the message size is oversized.
The allow limit 134217728 (did a simple math) is 128Mb and I cannot think of what may cause such big data.
Will this impact the integrity of data? And is there something I can do to avoid the error e.g. resize some param on Cassandra.yaml?
ERROR [ReadStage-1] 2024-03-29 05:36:26,158 JVMStabilityInspector.java:68 - Exception in thread Thread[ReadStage-1,5,SharedPool]
org.apache.cassandra.net.Message$OversizedMessageException: Message of size 142675369 bytes exceeds allowed maximum of 134217728 bytes
at org.apache.cassandra.net.OutboundConnection.enqueue(OutboundConnection.java:331)
at org.apache.cassandra.net.OutboundConnections.enqueue(OutboundConnections.java:92)
at org.apache.cassandra.net.MessagingService.doSend(MessagingService.java:417)
at org.apache.cassandra.net.OutboundSink.accept(OutboundSink.java:70)
at org.apache.cassandra.net.MessagingService.send(MessagingService.java:406)
at org.apache.cassandra.net.MessagingService.send(MessagingService.java:376)
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91)
at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
at org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:124)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:120)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
No. This exception was triggered in a
ReadStagethread - This type of thread is responsible for local reads, which don't modify the dataset in any way.Yes. I would start by finding the root cause and addressing it, rather than changing configuration. I can think of likely 2 scenarios where this exception would be triggered:
Cassandra 4.1.X and above:
nodetool tablestats -s compacted_partition_maximum_bytes -t 1Previous versions:
nodetool tablestats | grep "Compacted partition maximum bytes" | awk '{print $5}' | sort -n | tail -1If you see a partition over 128MiB, then it may be necessary to check if there is a query reading whole partitions in the correspondent table. And if there is one, rethink the data model in order to control partition size. One common solution to this problem is to bucket partitions by time or other arbitrary fields that can split the partitions in a balanced way.
ALLOW FILTERINGand don't filter by partition key, and it's usually very expensive in Cassandra. Generally you'll be able to catch those in debug.log through slow query logs. If this is the case, I strongly recommend to consider modeling a table for each of those queries so that all reads are single-partition reads and the database performance scales well with the workload.Finally, the quick configuration fix (in Cassandra 4.X) is to edit the following parameters in cassandra.yaml and restart nodes to apply changes:
internode_application_send_queue_reserve_endpoint_capacity_in_bytes- defaults to 134217728internode_application_receive_queue_reserve_endpoint_capacity_in_bytes- defaults to 134217728Feel free to check the official documentation on internode messaging here.