PHP Glob is taking up a lot of memory whenever searching through massive files

243 Views Asked by At

I am trying to open a folder with 9.52GB of files with Glob and search through it, but it seems glob can't handle it all, as this error shows:

Fatal error: Allowed memory size of 10484711424 bytes exhausted (tried to allocate 10250972360 bytes) in C:\xampp\htdocs\results.php on line 16 

results.php:

<?php
ini_set('memory_limit', '9999M');
error_reporting(E_ERROR | E_PARSE);
$file = glob('db/*', GLOB_BRACE); // this is line 16.
$searchfor = $_GET['q']; //(strlen($a) > 10
if(!strlen(trim($searchfor)) || (!$_GET['q'])) {
     echo "<h2>Enter something.</h2>";
}
else {

// get the file contents, assuming the file to be readable (and exist)
$contents = implode(array_map(function ($v) {
    return file_get_contents($v);
}, glob(__DIR__ . "/db/*")));

// escape special characters in the query
$pattern = preg_quote($searchfor, '/');

// finalise the regular expression, matching the whole line
$pattern = "/^.*$pattern.*\$/m";

// search, and store all matching occurences in $matches
    if (preg_match_all($pattern, $contents, $matches))
    {
    echo '<h2><center>Matches found!</center></h2>';
    echo "<pre>";
    echo implode($matches[0]);
    echo "</pre>";
    }
    else
{
   echo "<center><h2>No matches found.</h2></center>";
}
}

I have allocated 9999M through ini_set and it still doesn't work! I could go higher but I assume that's not a very good idea for my computer/server to handle.

Is there any way to fix this? I've tried googling everywhere returning little results.

1

There are 1 best solutions below

0
On

Use SPL GlobIterator class instead.

glob() function consume a lot of memory if you have too many files result. There is a contributed note on that manual page mentioned about this.

Don't use glob() if you try to list files in a directory where very much files are stored (>100.000). You get an "Allowed memory size of XYZ bytes exhausted ..." error.

Lets compare these 2 result. First I will use glob() function.

$startMemory = memory_get_usage();
echo $startMemory . ' (before glob)<br>';
$result = glob('/**/**/**', GLOB_BRACE);
$endMemory = memory_get_usage();
echo $endMemory . ' (after glob)<br>';
echo 'Total memory usage: ' . ($endMemory - $startMemory) . '<br>';

echo 'Matched ' . count($result) . ' items.<br>';
foreach ($result as $item) {
    echo $item . '<br>';
}

The result is ...

419904 (before glob)
711152 (after glob)
Total memory usage: 291248
Matched 2086 items.

Next I will use SPL GlobIterator class.

$startMemory = memory_get_usage();
echo $startMemory . ' (before glob)<br>';
$iterator = new GlobIterator('/**/**/**', FilesystemIterator::UNIX_PATHS);
$endMemory = memory_get_usage();
echo $endMemory . ' (after glob)<br>';
echo 'Total memory usage: ' . ($endMemory - $startMemory) . '<br>';

if (!$iterator->count()) {
    echo 'No matches';
} else {

    printf("Matched %d items.<br>", $iterator->count());

    foreach ($iterator as $item) {
        echo $item.'<br>';
    }
}

The result is ...

420704 (before glob)
424088 (after glob)
Total memory usage: 3384
Matched 2086 items.

As you can see, SPL GlobIterator use less memory.

Another note: If you have problem with file_get_contents() next (as mentioned by Chris Haas), you may try to use "read huge file line by line" instead.