Problem
I'm seeing very low throughput when attempting to run a dataflow job using the following template: https://cloud.google.com/dataflow/docs/guides/templates/provided/firestore-bulk-delete.
When running against our production dataset (~2.2 billion entities to remove) the throughput ramps up to ~500 entities per second within 2 hours. After 24 hours, the number of workers has not scaled out, and throughput has remained at ~500 entities per second. At this pace the job will take ~51 days to complete.
When I try running the same job on a smaller amount of data (~90 million entities to remove) the job will automatically scale out to ~11 workers, and ramp up to a throughput of ~7500 entities per second.
How might I improve the performance of this dataflow job when purging a large amount of data from datastore?
What I've tried (without success)
- I've confirmed that the dataflow job and workers are running in the same region as datastore.
- I've tried setting numWorkers to 20, but it quickly scales down to 1.