How to add multiple files to ZipArchive in O(n) time

127 Views Asked by At

I am offering a means of downloading multiple photos as a single archive file.

TAR format works fine, but most people don't have anything that can open TAR files on their device, then complain that it doesn't work.

ZIP format is unnecessary (since photos are already compressed), but is one that most people will be able to open.

The built-in PHP class ZipArchive appears to only have a method addFile to add one file at a time. It seems that this involves decompression and recompression of the entire archive, with the effect that the more files you add, the slower it gets - i.e. it runs in O(n2) time to add n files. The effect of that becomes catastrophic beyond about 30-40 hi-res photos.

Have I missed something about ZipArchive? Or is this a shortcoming in the class that should be put forward as a feature request?

Are there alternatives to achieve the goal of a quick-to-produce archive format that most people will be able to open without installing additional software?

2

There are 2 best solutions below

3
Chris On

Regarding the comments in the ZipArchive::addFile manual, it looks like the file is not actually added on the function call. The files will be added when the zip-object is closed.

8
Álvaro González On

It would be a pretty flawed design if that was the case, but you never know, so let's test it.

First, let's prepare some 50 MB files to use during the rest of the benchmarks:

define('FILE_COUNT', 100);
mkdir(__DIR__ . '/data');
for ($i =0; $i < FILE_COUNT; $i++) {
    file_put_contents(__DIR__ . "/data/file-$i.txt", str_repeat('x', 52_428_800));
}

Now, let's create some ZIP archives with increasing numbers of files:

define('FILE_COUNT', 1); // <-------- We'll be increasing this

$t0 = microtime(true);

$zip = new ZipArchive();
$zip->open(__DIR__ . '/zip-test.zip', ZipArchive::CREATE);
for ($i = 0; $i < FILE_COUNT; $i++ ){
    $zip->addFile(__DIR__ . "/data/file-$i.txt");
}
echo "Files added: {$zip->numFiles}\n";
echo "Status: {$zip->status}\n";
$zip->close();

$time = microtime(true) - $t0;
echo "Total time: " . number_format($time, 3) . " seconds\n";
echo "Average time: " . number_format($time / FILE_COUNT, 3) . " seconds/file\n";
echo "Max RAM used: " . number_format(memory_get_peak_usage(real_usage: true)) . " bytes\n";

In my current computer (Windows 10 desktop, PHP/8.1.15 64-bit):

Files added: 1
Status: 0
Total time: 0.505 seconds
Average time: 0.505 seconds/file
Max RAM used: 2,097,152 bytes
Files added: 10
Status: 0
Total time: 4.854 seconds
Average time: 0.485 seconds/file
Max RAM used: 2,097,152 bytes
Files added: 100
Status: 0
Total time: 48.428 seconds
Average time: 0.484 seconds/file
Max RAM used: 2,097,152 bytes

So... This is of course not a scientific benchmark, just a quick test to figure out the overall situation. And the conclusion is that it appears to be linear and it definitively does not compress and uncompress the entire archive every time. In fact, the big delay happens in $zip->close().

Then I ran some test disabling compression entirely:

for ($i = 0; $i < FILE_COUNT; $i++ ){
    $zip->addFile(__DIR__ . "/data/file-$i.txt");
    $zip->setCompressionName(__DIR__ . "/data/file-$i.txt", ZipArchive::CM_STORE);
}

Much faster indeed:

Files added: 100
Status: 0
Total time: 19.541 seconds
Average time: 0.195 seconds/file
Max RAM used: 2,097,152 bytes

Last but not least, an earlier version of this informal benchmark used ->addFromString() without too much thought and I found a consistent increase in RAM usage as I added more files.

Conclusions:

  • Quick testing suggest it's O(n).
  • Disable compression if your files are already compressed.
  • Let ZipArchive load your files from disk whenever possible.