Cassandra Bloom Filter - False Positive

31 Views Asked by At

In our production server we are seeing higher p99. repaird are running for 3 weeks still only 85% repaired. This is caused multiple reasons. One of them is - Cassandra LIMIT 1 is not optimized.

But today I want to discuss about access patterns. In last 12 hours

HTTP RESPONSE Status No. Of requests
200 61041189
404 7971055

About ~12% read are for partitions which doesn't exist yet, weird legacy logic which is very difficult to change immediately.

Currently Cluster Settings

Compaction Strategy: Size
bloom_filter_fp_chance=0.01

nodetool cfstats
Bloom filter false positives: 204164614
Bloom filter false ratio: 0.00844
Bloom filter space used: 471339624

Question

Does it make sense to change the bloom filter to .001?

1

There are 1 best solutions below

0
Mário Tavares On

Your false positive ratio looks fine, or at least it's on par with the configuration - it's configured to have false positives around 1.00% of the time, whereas the actual ratio is 0.84%

The total of 204 164 614 false positives that you see in cfstats may look like a large number, but represents the number of false positives out of about 25 500 000 000 bloom filter checks, and should only be analyzed in relation to that total, not by itself.

You can still decrease the false positive chance, but it may not be worth it. If the sstables for the table are small enough (>10GB at most), even when the reads pass the bloom filter check, the overhead of the false positive reads should be negligible. If you have sstables in the order of 100s of GB or TB, then the overhead may justify a retune of the BF false positive chance.

If you do decrease the false positive chance, it comes at a twofold cost:

  1. Bloom filters are stored in the data disk alongside sstable files. At this moment it's around 450MB in size, but that would increase, using disk space that may be critical for live data.
  2. Even though bloom filters are stored in disk for backup, they live in off-heap memory. This means that memory allocation increases with lower false-positive chances.

The short answer is that you can decrease the false positive chance if you can afford the storage and memory costs.

Nonetheless, if the goal is to improve read performance, typically bloom filter is not the main culprit - I would also look into other factors such as:

  • JVM garbage collection - look for time spent on STW per minute
  • Slow query logs - look for the slowest queries
  • Query anti-patterns - Secondary indexes/range queries can be impactful cluster-wide
  • Table chunk size - The default is typically too large and inefficient for small reads
  • Read ahead - The default is typically too large and inefficient for small reads
  • Tombstones - Check system logs and nodetool cfstats, as large volumes of tombstones on a single scan often cause timeouts or high latencies.
  • Load balance/hotspots - Some nodes may be doing disproportionally more work than others.