I would like to move files from Backblaze B2 to Amazon S3. The instructions here say that I should download them to a local directory. However, I am trying to transfer about 180 TB of data so I would prefer to not have to download them locally.
I found this post with a similar question, but I was wondering if there was a way to do this using the command line instead of ForkLift.
Thank you
Yes, you can do this using the AWS CLI. The
aws s3 cpcommand can readstdinor write tostdoutby using-instead of a filename, so you can pipe twoaws s3 cpcommands together to read a file from Backblaze B2 and write it to Amazon S3 without it hitting the local disk.First, configure two AWS profiles from the command line - one for B2 and the other for AWS.
aws configurewill prompt you for the credentials for each account:After you run
aws configure, edit the AWS config file (~/.aws/configon Mac and Linux,C:\Users\USERNAME\.aws\configon Windows) and add a value forendpoint_urlto theb2profile. This saves you from having to specify the--endpoint-urloption every time you runaws s3with theb2profile.For example, if your B2 region was
us-west-004and your AWS region wasus-west-1, you would edit your config file to look like this:Now you can specify the profiles in the two
aws s3 cpcommands.It's easy to run a quick test on a single file
One wrinkle is that, if the file is more than 50 GB, you will need to use the
--expected-sizeargument to specify the file size so that thecpcommand can split the stream into parts for a large file upload. From the AWS CLI docs:Here's a one-liner that copies the contents of a bucket on B2 to a bucket on S3, outputting the filename (object key) and size of each file. It assumes you've set up the profiles as above.
Although this technique does not hit the local disk, the data still has to flow from B2 to wherever this script is running, then to S3. As @Mark B mentioned in his answer, run the script on an EC2 instance for best performance.