I became curious about this after seeing this paper on bloom clocks. The basic idea is that a counting bloom filter, which supports both the sum (counterwise sum) and union (counterwise max) of multisets, can be used as a compromise between scalar Lamport clocks and vector clocks, where each node increments a hash-determined subset of the counters by 1 on an event.
Taking an example like 16 counters with each node incrementing 3 of them you still get a clock that is fairly good at finding runs of a thousand causally unrelated events in the high concurrency limit where many events can have similar timestamps.
However, for a practical implementation of things like causally ordered message topics where you can end up with more than 4 billion events in a strict sequence, you still need the counting bloom filter to use 64-bit counters.
This made me wonder if there are specialized data structures to minimize space usage for the very specific case of a probabilistic multiset with very large counts, that needs to support multiset sum and union?
You can use floating point counters.
For example, lets say you store a counter as a pair of 8-bits numbers
(e,m), representing the countm << e.When incrementing a counter:
e > 0, then the counter is missingelow-order bits. In order to compensate for that, you randomly decide whether or not to incrementm, with probability 2-emwould overflow its 8 bits, then incrementeand shiftmto the right.Of course, this makes the counters a little bit inaccurate, but the expected inaccuracy is small compared to the count.