Read timeout error after Cassandra upgraded from 2.2.19 to 3.11.13

80 Views Asked by At

I have two dc cluster, where one dc was upgraded from Cassandra 2.2.19 to 3.11.13. Currently I notice cql timeout reported after the upgrade on the Cassandra 3.x DC. The other dc which is on version 2.x does not have this issue. Could you please let me know if Cassandra 3.11.13 version needs timeout value to be increased than the previous setting ? Below are the current timeout values configured, what must be done in order to resolve this issue?

Error:

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 3 responses." info={'received_responses': 3, 'required_responses': 4, 'consistency': 'QUORUM'}

Timeout value in Casssandra.yaml Configuration file:

read_request_timeout_in_ms: 5000 range_request_timeout_in_ms: 10000 write_request_timeout_in_ms: 10000 counter_write_request_timeout_in_ms: 5000 cas_contention_timeout_in_ms: 1000 truncate_request_timeout_in_ms: 60000 request_timeout_in_ms: 10000 cross_node_timeout: false

1

There are 1 best solutions below

2
Aaron On

The likely issue here, is that Cassandra 2.2 and 3.11 use different, default versions of the CQL binary protocol. I suspect one of two things is happening:

  1. The application is specifying a protocol version that is too low for 3.11, so it cannot reach them.
  2. The application is not specifying a "local" data center, resulting in QUORUM being attempted across all nodes regardless of DC. If the application negotiated a binary protocol version that is too low for 3.11, then it cannot reach those nodes.

In looking at the messages above, I'm going to suspect it's #2. Make sure the app is specifying a local DC, and run the query at LOCAL_QUORUM instead of QUORUM. And/or force a protocol version that works with both Cassandra 2.2 and Cassandra 3.11, as shown on this page.

It also depends on which version of the driver that you're using. My guess is that protocol v2 is being negotiated, and Cassandra 3.11 doesn't support that.

The other potential issue, is that your application is still using Thrift. In that case, remember that Cassandra 3.11 defaults Thrift to "disabled," so you'd need to enable that.