Why BigTable has high latency spikes?

178 Views Asked by At

We have two BigTable clusters joined into one instance. There is a bidirectional replication between these cluster. As I understand, replication is always bidirectional and it is not possible to make one-directional. There are two profiles:

  • first - for production, it reads and writes and requires low latency. This profile is routed to one cluster
  • second - for analytics. This profile reads big amounts of data and routed to another cluster.

Also, cluster for analytics has auto-scaling policy. By default it has 1 node, but when there is a big load, it scales to 10 nodes.

The majority of time production cluster provides more or less consistent latency. But sometimes may appear high latency spikes. It seems, that there is a correlation with high read activity on the second cluster (during these activities load for cluster may be 100% and new nodes appear during scaling)

The question: is possible that a one cluster my influence to the latency of another? Are there any approaches how to deal with it?

2

There are 2 best solutions below

0
Vaidehi Jamankar On

The performance when using replication certainly is use case based and according to the need of the business model you are using it for.Also , definitely when you enable replication it affects the performance of Bigtable instance Performance can be impacted when the CPU utilization is consistently above the recommended levels.

I would also recommend you to check official documentation for performance and traffic routing to understand how performance can be maintained when using replication.

When using your use case, multiple bigtable clusters in one instance, will have its own analytical workload, during read and writes to maintain the high availability of the main cluster.You can try to provision additional CPU resources to the cluster to pull in the replication changes and does not affect the performance during this time. Also check if there are re-connections with batch read writes or replication threads that are alive during this time to understand the cause of this.

1
Bora On

What routing policy are you using? If you're using multi-cluster routing, some of your analytical reads could be hitting your serving/production cluster.

If you want analytical reads to only target the 2nd cluster you should use single-cluster routing policy.