Copy list of objects across buckets while keeping object prefixes intact

42 Views Asked by At

ℹ︎ This question is about the MinIO Client, not the MinIO Server object storage!

I have two S3-compatible buckets A and B, and I need to copy a list of objects from A to B. The catch is that

  1. I do not want to copy all objects from A, nor do my objects all have a shared prefix; instead, I have a list (multi-line text file) of objects that I need to copy. So I cannot just use mc cp --recursive or mc mirror (as far as I can see).
  2. I need to preserve the entire object keys on B, including the full prefixes.

For example, say I have the following list (in a file objects.txt):

a/b/1.txt
c/2.txt
d/e/3.txt
f/1.txt

And after the copy operation I expect all these objects, with their original names, to reside in B.

My initial attempt to achieve this was the following command:

mc cp $(sed 's,^,s3/A/,' objects.txt) s3/B

Unfortunately the resulting bucket B had the following listing:

B/1.txt
B/2.txt
B/3.txt

That is, mc treated the object keys as “file paths” and stripped the “directory name”, and it lost one object that had a duplicate key after stripping the prefix.

My current workaround is to use an explicit loop, i.e.

while read -r object; do
  mc cp s3/A/"$object" s3/B/"$object"
done <objects.txt

… But this is a lot slower. Part of the issue is that my mc is containerised, so every invocation is spawning a new Singularity container. Furthermore, the copy operation does not seem to take advantage of the fact that both buckets are in the same region: I am getting average throughput of 10 MiB/s with a wide variation: it seems that mc cp is moving all the data via my local compute node rather than moving it directly between buckets. Ideally a direct copy would be performed (for comparison, mc mirror achieves > 1 GiB/s throughput).

0

There are 0 best solutions below