I'm working on a project where child objects each have an attachment. I need to combine each of the attachments into a single zip file at the parent level
The following code is functional, but not efficient for large numbers of child rows (thousands).
require 'zip'
namespace :marketing do
task :zip_images => :environment do
mj = ParentJob.find(ENV["JOB_ID"])
@bucket = Aws::S3::Resource.new.bucket("marketing-images-#{Rails.env}")
Zip::OutputStream.open("#{Rails.root}/tmp/temp_file_#{mj.id}.zip") do |zipfile|
while Childrow.where(image_job_id: mj.id, status: 'Processed').count > 0
row = Childrow.where(image_job_id: mj.id, status: 'Processed').first
zipfile.put_next_entry("files/#{row.folder}/#{row.school_id}.png")
zipfile.puts @bucket.object(row.images_s3.key).get.body.read
row.update(status: 'Zipped')
end
end
mj.full_zip.attach(io: File.open("#{Rails.root}/tmp/temp_file_#{mj.id}.zip"), filename: 'files.zip', content_type: 'application/zip', identify: false)
mj.update(status: 'Complete')
File.delete("#{Rails.root}/tmp/temp_file_#{mj.id}.zip")
end
end
I'm working in a Heroku environment, so I must maintain my operating memory usage below 1G. Even though I'm writing to temp file, it is still filling up memory and swaping 1G.
To work around this, I'm trying to stream everything to s3 directly. Reading each file directly from s3, and sending the railsZip output stream directly to s3.
require 'zip'
namespace :marketing do
task :zip_images => :environment do
mj = ParentJob.find(ENV["JOB_ID"])
@bucket = Aws::S3::Resource.new.bucket("marketing-images-#{Rails.env}")
obj = @bucket.object('myfile.zip').put({
content_type: 'application/zip',
body: Zip::OutputStream.write_buffer do |zio|
while Childrow.where(image_job_id: mj.id, status: 'Processed').count > 0
row = Childrow.where(image_job_id: mj.id, status: 'Processed').first
zio.put_next_entry("files/#{row.folder}/#{row.school_id}.png")
zio.write row.marketing_images_s3.download
row.update(status: 'Zipped')
end
end
})
mj.full_zip.attach(io: @bucket.object('myfile.zip').get.body, filename: 'files.zip', content_type: 'application/zip', identify: false)
mj.update(status: 'Complete')
@bucket.object('myfile.zip').delete
end
end
But when I'm putting the contents to s3, I get the following error:
Aws::S3::Errors::BadDigest: The Content-MD5 you specified did not match what we received.
How can I ensure the digest matches, or how can I correct how I'm trying to send?