What in exifr causes this Tempfile to get closed?

70 Views Asked by At

In this piece of Ruby code that processes an UploadedFile using exifr

f = uploaded_file.tempfile
p "1 #{f.closed?} #{f.instance_variable_get(:'@unlinked')}"
#1 EXIFR::JPEG.new(StringIO.new(f.read))
#2 EXIFR::JPEG.new(f)
p "2 #{f.closed?} #{f.instance_variable_get(:'@unlinked')}"
GC.start
sleep 0.01
p "3 #{f.closed?} #{f.instance_variable_get(:'@unlinked')}"
p "4 #{f.size}"

N.B. GC.start/sleep is there to make the problem replicate reliably.

when uncommenting #1, all is fine:

"1 false false"
"2 false false"
"3 false false"
"4 3822528"

However, the outcome of uncommenting #2, instead of #1, yields this:

"1 false false"
"2 false false"
"3 true false"
[c4b7ce6b-5492-43db-8c64-726cafaccce0] [Thread: 24800] Errno::ENOENT (No such file or directory @ rb_file_s_size - /var/folders/vx/v0rn818s0257_3l491_v48bm0000gn/T/RackMultipart20240221-71765-acbi7v.JPG):

Now all that exifr is doing is this:

    def initialize(file, load_thumbnails: true)
...
        examine(file.dup, load_thumbnails: load_thumbnails)
...
      end
    end

    class Reader < SimpleDelegator
      def readbyte; readchar; end unless File.method_defined?(:readbyte)
      def readint; (readbyte << 8) + readbyte; end
      def readframe; read(readint - 2); end
      def readsof; [readint, readbyte, readint, readint, readbyte]; end
      def next
        c = readbyte while c != 0xFF
        c = readbyte while c == 0xFF
        c
      end
    end

    def examine(io, load_thumbnails: true)
      io = Reader.new(io)
...

and a bit of reading from io, so I don't understand what would cause the file to get closed.

This happens in a Rails app running on puma.

#2 would be preferable, as it does not require the file to be loaded into memory completely (in my case, we are talking up to 50 MB).

2

There are 2 best solutions below

0
Evgeniy Berezovsky On BEST ANSWER

Thanks to @Casper, I understood I got duped by f.dup - wouldn't have thought that part of the standard Ruby library would behave this way - deleting a Tempfile when a (dup'ed) reference is still around.

The way I chose to fix this is different from Casper's solutions, though, because I already have an open Tempfile, and I want to use it, instead of concurrently re-opening the file. (Who knows what implications that would have, on different OSes?)

So this is how I fixed it:

EXIFR::JPEG.new(SelfDuper.new(f))

And this is the helper class I wrote for it:

require 'delegate'

class SelfDuper < SimpleDelegator
  def dup
    self
  end
end
0
Casper On

The reason why this happens is most likely because of this explanation in the Tempfile documentation:

When a Tempfile object is garbage collected, or when the Ruby interpreter exits, its associated temporary file is automatically deleted.

Now EXIFR does file.dup on the file object when it calls examine. This duplicated object will be garbage collected once EXIFR is done reading the file, and as a side effect of this garbage collection, the temporary file will be deleted.

Let's call this problem a Tempfile race condition. Tempfile is most likely not "dup-safe", and EXIFR is probably programmed expecting a File object, in which case this problem would not happen.

Therefore the solution is to not use EXIFR::JPEG.new(f), if you want to retain the temporary file for later processing.

The other solution is to open the temporary file yourself using a File object, and pass that object to EXIFR::JPEG.new instead.

This way you will not have to read the file into memory, nor will garbage collection delete the temporary file as long as you maintain a reference to f somewhere.

And the final and probably easiest solution is to notice that EXIFR also accepts a file path string to the initialize method. Therefore this will probably be the simplest fix:

EXIFR::JPEG.new(f.path)