I use rocksdb's bulk loading mechanism to load a bunch of sst files generated by offline spark tasks. In order to avoid a large number of disk IO during the loading and compacting process from affecting online read requests, I want to finish offline loading, and after the db compact is completed, the nodes will be brought online when there is no disk write IO. Is there such a notification, or some other way to help me do this? I have currently tried the following ways:
- implement EventListener::OnCompactionCompleted, it's based on single compaction job not the whole compaction process which probably have multiple compaction jobs
- after DB::IngestExternalFile returns, call DB::CompactRange(CompactRangeOptions(), nullptr, nullptr) do manual compaction. but there are still some compaction jobs running after compactrange returns
There's not a great interface for this at the moment. Probably the best you can do is periodically poll
DB::GetIntProperty()onProperties::kCompactionPendingandProperties::kNumRunningCompactionsuntil both are zero.It might suffice to poll these only on
OnCompactionCompletedbut I wouldn't completely trust that to work reliably for all future versions.Regarding manual full
DB::CompactRange(): with no ongoing writes, there should not be any compaction left to do after a full compaction completes, but if you intend to trigger a full compaction, unnecessary automatic compactions might trigger before the manual compaction unless you open withdisable_auto_compactions=true(each applicable column family). So opening withdisable_auto_compactions=trueand waiting for a fullCompactRangeis another good path to success.