rocksdb all compaction jobs done notification

300 Views Asked by At

I use rocksdb's bulk loading mechanism to load a bunch of sst files generated by offline spark tasks. In order to avoid a large number of disk IO during the loading and compacting process from affecting online read requests, I want to finish offline loading, and after the db compact is completed, the nodes will be brought online when there is no disk write IO. Is there such a notification, or some other way to help me do this? I have currently tried the following ways:

  • implement EventListener::OnCompactionCompleted, it's based on single compaction job not the whole compaction process which probably have multiple compaction jobs
  • after DB::IngestExternalFile returns, call DB::CompactRange(CompactRangeOptions(), nullptr, nullptr) do manual compaction. but there are still some compaction jobs running after compactrange returns
1

There are 1 best solutions below

0
Peter Dillinger On

There's not a great interface for this at the moment. Probably the best you can do is periodically poll DB::GetIntProperty() on Properties::kCompactionPending and Properties::kNumRunningCompactions until both are zero.

It might suffice to poll these only on OnCompactionCompleted but I wouldn't completely trust that to work reliably for all future versions.

Regarding manual full DB::CompactRange(): with no ongoing writes, there should not be any compaction left to do after a full compaction completes, but if you intend to trigger a full compaction, unnecessary automatic compactions might trigger before the manual compaction unless you open with disable_auto_compactions=true (each applicable column family). So opening with disable_auto_compactions=true and waiting for a full CompactRange is another good path to success.