How to combine several activeStorage attachments into a single s3 bucket file, without a tempfile or in memory

67 Views Asked by At

I'm working on a project where child objects each have an attachment. I need to combine each of the attachments into a single zip file at the parent level

The following code is functional, but not efficient for large numbers of child rows (thousands).

require 'zip'
namespace :marketing do
  task :zip_images => :environment do
    mj = ParentJob.find(ENV["JOB_ID"])
    @bucket = Aws::S3::Resource.new.bucket("marketing-images-#{Rails.env}")
    Zip::OutputStream.open("#{Rails.root}/tmp/temp_file_#{mj.id}.zip") do |zipfile|
      while Childrow.where(image_job_id: mj.id, status: 'Processed').count > 0
        row = Childrow.where(image_job_id: mj.id, status: 'Processed').first
        zipfile.put_next_entry("files/#{row.folder}/#{row.school_id}.png")
        zipfile.puts @bucket.object(row.images_s3.key).get.body.read
        row.update(status: 'Zipped')
      end
    end
    mj.full_zip.attach(io: File.open("#{Rails.root}/tmp/temp_file_#{mj.id}.zip"), filename: 'files.zip', content_type: 'application/zip', identify: false)
    mj.update(status: 'Complete')
    File.delete("#{Rails.root}/tmp/temp_file_#{mj.id}.zip")
  end
end

I'm working in a Heroku environment, so I must maintain my operating memory usage below 1G. Even though I'm writing to temp file, it is still filling up memory and swaping 1G.

To work around this, I'm trying to stream everything to s3 directly. Reading each file directly from s3, and sending the railsZip output stream directly to s3.

require 'zip'
namespace :marketing do
  task :zip_images => :environment do
    mj = ParentJob.find(ENV["JOB_ID"])
    @bucket = Aws::S3::Resource.new.bucket("marketing-images-#{Rails.env}")
    obj = @bucket.object('myfile.zip').put({
      content_type: 'application/zip',
      body: Zip::OutputStream.write_buffer do |zio|
        while Childrow.where(image_job_id: mj.id, status: 'Processed').count > 0
          row = Childrow.where(image_job_id: mj.id, status: 'Processed').first
          zio.put_next_entry("files/#{row.folder}/#{row.school_id}.png")
          zio.write row.marketing_images_s3.download
          row.update(status: 'Zipped')
        end
      end
    })
    mj.full_zip.attach(io: @bucket.object('myfile.zip').get.body, filename: 'files.zip', content_type: 'application/zip', identify: false)
    mj.update(status: 'Complete')
    @bucket.object('myfile.zip').delete
  end
end

But when I'm putting the contents to s3, I get the following error:

Aws::S3::Errors::BadDigest: The Content-MD5 you specified did not match what we received.

How can I ensure the digest matches, or how can I correct how I'm trying to send?

0

There are 0 best solutions below