Which compaction strategy is recommended for a table with minimal updates

92 Views Asked by At

I am looking for compaction strategy for the data which has following characteristics

  1. We don't need the data after 60-90 days. At extreme scenarios maybe 180 days.
  2. Ideally insert happens and updates never happens but it is realistic to expect duplicate events which cause updates.
  3. It is indirectly time series data if you think about it, events coming first will be stored first and once the event is stored its almost never modified unless duplicate events are published.

Which strategy will be best for this case?

1

There are 1 best solutions below

0
Erick Ramirez On

TimeWindowCompactionStrategy is only suitable for timeseries use cases and is the only reason you'd choose TWCS.

LeveledCompactionStrategy has very limited edge cases and the time I spend helping users troubleshoot LCS because it doesn't suit their needs is hardly worth the supposed benefits.

Unless you have some very specific requirements, SizeTieredCompactionStrategy is almost always the right choice and the reason it is the default compaction strategy. Cheers!