I would like to move files from Backblaze B2 to Amazon S3. The instructions here say that I should download them to a local directory. However, I am trying to transfer about 180 TB of data so I would prefer to not have to download them locally.

I found this post with a similar question, but I was wondering if there was a way to do this using the command line instead of ForkLift.

Thank you

1

There are 1 best solutions below

0
metadaddy On BEST ANSWER

Yes, you can do this using the AWS CLI. The aws s3 cp command can read stdin or write to stdout by using - instead of a filename, so you can pipe two aws s3 cp commands together to read a file from Backblaze B2 and write it to Amazon S3 without it hitting the local disk.

First, configure two AWS profiles from the command line - one for B2 and the other for AWS. aws configure will prompt you for the credentials for each account:

% aws configure --profile b2
% aws configure --profile aws

After you run aws configure, edit the AWS config file (~/.aws/config on Mac and Linux, C:\Users\USERNAME\.aws\config on Windows) and add a value for endpoint_url to the b2 profile. This saves you from having to specify the --endpoint-url option every time you run aws s3 with the b2 profile.

For example, if your B2 region was us-west-004 and your AWS region was us-west-1, you would edit your config file to look like this:

[profile b2]
region = us-west-004
endpoint_url = https://s3.us-west-004.backblazeb2.com

[profile aws]
region = us-west-1

Now you can specify the profiles in the two aws s3 cp commands.

aws --profile b2 s3 cp s3://<Your Backblaze bucket name>/filename.ext - \
| aws --profile aws s3 cp - s3://<Your AWS bucket name>/filename.ext

It's easy to run a quick test on a single file

# Write a file to Backblaze B2
% echo 'Hello world!' | \
aws --profile b2 s3 cp - s3://metadaddy-b2/hello.txt

# Copy file from Backblaze B2 to Amazon S3
% aws --profile b2 s3 cp s3://metadaddy-b2/hello.txt - \
| aws --profile aws s3 cp - s3://metadaddy-s3/hello.txt

# Read the file from Amazon S3
% aws --profile aws s3 cp s3://metadaddy-s3/hello.txt -
Hello world!

One wrinkle is that, if the file is more than 50 GB, you will need to use the --expected-size argument to specify the file size so that the cp command can split the stream into parts for a large file upload. From the AWS CLI docs:

--expected-size (string) This argument specifies the expected size of a stream in terms of bytes. Note that this argument is needed only when a stream is being uploaded to s3 and the size is larger than 50GB. Failure to include this argument under these conditions may result in a failed upload due to too many parts in upload.

Here's a one-liner that copies the contents of a bucket on B2 to a bucket on S3, outputting the filename (object key) and size of each file. It assumes you've set up the profiles as above.

aws --profile b2 s3api list-objects-v2 --bucket metadaddy-b2 \
| jq '.Contents[] | .Key, .Size' \
| xargs -n2 sh -c 'echo "Copying \"$1\" ($2 bytes)"; \
    aws --profile b2 s3 cp "s3://metadaddy-b2/$1" - \
    | aws s3 --profile aws cp - "s3://metadaddy-s3/$1" --expected-size $2' sh

Although this technique does not hit the local disk, the data still has to flow from B2 to wherever this script is running, then to S3. As @Mark B mentioned in his answer, run the script on an EC2 instance for best performance.