I am copying large number of files from source bucket to destination bucket where source bucket is encrypted with AES256.
gcloud storage cp is fastest option to achieve same and we can pass encryption keys.
However I want to skip files which are already copied, There is a way to pass manifest file to skip files already copied.
My concern is what happens when this manifest file grows bigger.
For e.g. for transferring data of 3.5GiB size with 837136 files created manifest file of size ~278MB.
Currently data transfer service doesn't support data transfer where source bucket is encrypted with AES256.
Question
So for transferring data size of Terabytes, this file will become even bigger then the question is how does gcloud storage cp handles and reads this file ? Will the size of manifest file will become bottleneck and cause throttling issues on memory ? Is there any documentation how gcloud storage handles this ?
Based on this Google blog on Faster Cloud Storage transfers using the gcloud command-line:
Also,
gcloud storage cptakes advantage of parallel composite uploads wherein a file is divided into 32 chunks and then uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted.With regards to bottleneck, it is suggested to avoid the sequential naming bottleneck as this can cause an upload speed issue, since the majority of your connections will all be directed to the same shard, since the filenames are so similar. A simple solution to this issue is to simply re-name your folder or file structure such that they are no longer linear.
Here are some documentations that you may find useful and that you can test on your projects:
It is also recommended to perform resumable uploads as this is very important in case there's a network or connection interruption and you don't want to start uploading chunks of data all over again.