Is there an optimized version of counting bloom filters for the case when counts are very large?

160 Views Asked by saolof At 13 June 2023 at 09:09

I became curious about this after seeing this paper on bloom clocks. The basic idea is that a counting bloom filter, which supports both the sum (counterwise sum) and union (counterwise max) of multisets, can be used as a compromise between scalar Lamport clocks and vector clocks, where each node increments a hash-determined subset of the counters by 1 on an event.

Taking an example like 16 counters with each node incrementing 3 of them you still get a clock that is fairly good at finding runs of a thousand causally unrelated events in the high concurrency limit where many events can have similar timestamps.

However, for a practical implementation of things like causally ordered message topics where you can end up with more than 4 billion events in a strict sequence, you still need the counting bloom filter to use 64-bit counters.

This made me wonder if there are specialized data structures to minimize space usage for the very specific case of a probabilistic multiset with very large counts, that needs to support multiset sum and union?

Original Q&A

There are 1 best solutions below

Matt Timmermans On 13 June 2023 at 18:20

You can use floating point counters.

For example, lets say you store a counter as a pair of 8-bits numbers (e,m), representing the count m << e.

When incrementing a counter:

if e > 0, then the counter is missing e low-order bits. In order to compensate for that, you randomly decide whether or not to increment m, with probability 2^-e
If incrementing m would overflow its 8 bits, then increment e and shift m to the right.

Of course, this makes the counters a little bit inaccurate, but the expected inaccuracy is small compared to the count.

Is there an optimized version of counting bloom filters for the case when counts are very large?

There are 1 best solutions below

Related Questions in ALGORITHM

Related Questions in DATA-STRUCTURES

Related Questions in CONCURRENCY

Related Questions in CLOCK

Related Questions in BLOOM-FILTER

Trending Questions

Popular # Hahtags

Popular Questions