I have a simple PHP script that calculates some things about a given string input. It caches the results to a database and we occasionally delete entries that are older than a certain number of days.
Our programmers implemented this database as:
function cachedCalculateThing($input) {
$cacheFile = 'cache/' . sha1($input) . '.dat';
if (file_exists($cacheFile) {
return json_decode(file_get_contents($cacheFile));
}
$retval = ...
file_put_contents(json_encode($retval));
}
function cleanCache() {
$stale = time() - 7*24*3600;
foreach (new DirectoryIterator('cache/') as $fileInfo) {
if ($fileInfo->isFile() && $fileInfo->getCTime() < $stale) {
unlink($fileInfo->getRealPath());
}
}
We use Ubuntu LAMP and ext3. At what number of entries does cache lookup become non-constant or violate a hard limit?
While that particular code is not very "scalable"* at all, there are a number of things that can improve it:
iostat -x 4and look for your current disk utilization. If it is higher than say 25% already, putting the disk caching on will spike it to 100% at random times and slow all web service down. (Because requests to the disk will have to be queued and serviced (generally) in order (not always, but don't bank on it)).'*For scalability, it is usually defined in terms of linear request speed or parallel server resources. That code:
find ./cache -type f -mtime +7 -exec rm -f "{}" \;The main takeaway point is that to achieve real scalability and good performance out of the caching system, a little more consideration has to be taken than the short block of code shows. There may be more limits than the ones I've enumerated, but even those are subject to the variables such as size, number of entries, number of requests/sec, current disk load, file system type, etc -- things that are external to the code. Which is to be expected, because a cache persists outside of the code. The code listed can perform for a small boutique set of caching with low numbers of requests, but may not for the bigger sizes that one comes to need caching for.
Also, are you running Apache in thread or prefork mode? It is going to affect how php blocks its reads and writes.
-- Um, I probably should have added that you want to track your object and key/hash.. If the $input is already a string, it is in it's base form/has already been computed, retrieved, serialized, etc. If $input is the key, then file_put_contents() needs to put something else (the actual variable/contents). If $input is the object to look up (which could be like a long string, or even a short one), then it needs a lookup key, otherwise no computation is being bypassed/saved.