Separate lru for each argument value?

252 Views Asked by At

I need to write a sql database column getter, that takes in a column name and time, and returns the entire column of values for that column corresponding to the input time. This may be a frequent function call with the same arguments, so I would like to use an lru cache. However, I'm not sure if the frequency of the column names is uniformly distributed, so ideally, I would have a separate lru cache for each column name.

I previously had it like below, but I would like to separate the lru for each col_name.

@lru_cache(...)
def get_col(self, col_name, time)
  # do stuff to get the column and return it

How can I achieve this? Also, unfortunately, I have to support py2.

3

There are 3 best solutions below

4
VonC On

Since functools#lru_cache() was introduced only with Python 3.x, you will need to manage separate cache dictionaries for each column name, meaning:

  • Create an LRU cache class or use a third-party library that supports Python 2 to implement LRU cache functionality.
  • Inside your function, maintain a dictionary where each key is a column name and the value is an instance of an LRU cache.
  • Before fetching the column data in the get_col function, check if the column name is in the dictionary. If it is, use the associated LRU cache to get the data. If it is not, create a new LRU cache for that column name.

Something like this LRUCache class:

from collections import OrderedDict

class LRUCache(object):
    def __init__(self, maxsize=3): # Adjust maxsize as necessary
        self.cache = OrderedDict()
        self.maxsize = maxsize

    def get(self, key):
        sentinel = object()
        value = self.cache.pop(key, sentinel)
        if value is sentinel:
            return None
        else:
            # If key was in dictionary, reinsert it at the end
            self.cache[key] = value
            return value

    def put(self, key, value):
        if len(self.cache) >= self.maxsize:
            # Remove oldest entry
            self.cache.popitem(last=False)
        self.cache[key] = value

# Dictionary to hold separate LRU caches for each column name
column_caches = {}

def get_col(col_name, time):
    # Get the LRU cache for the specified column name
    # If it does not exist, create a new one
    cache = column_caches.get(col_name)
    if cache is None:
        cache = LRUCache(maxsize=3) # Adjust maxsize as necessary
        column_caches[col_name] = cache

    # Try to get the value from the cache
    result = cache.get(time)
    if result is not None:
        print(cache.cache) # Print the state of the cache
        return result

    # If the value is not in the cache, fetch it
    result = "{} at {}".format(col_name, time) # Dynamic data generation based on input parameters

    # Store the fetched value in the cache and return it
    cache.put(time, result)
    print(cache.cache) # Print the state of the cache
    return result

# Testing
print(get_col('foo', 1))
print(get_col('foo', 2))
print(get_col('foo', 3))
print(get_col('foo', 4))
print(get_col('foo', 3))
print(get_col('foo', 1))

Output: (demo tio.run)

OrderedDict([(1, 'foo at 1')])
foo at 1
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2')])
foo at 2
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2'), (3, 'foo at 3')])
foo at 3
OrderedDict([(2, 'foo at 2'), (3, 'foo at 3'), (4, 'foo at 4')])
foo at 4
OrderedDict([(2, 'foo at 2'), (4, 'foo at 4'), (3, 'foo at 3')])
foo at 3
OrderedDict([(4, 'foo at 4'), (3, 'foo at 3'), (1, 'foo at 1')])
foo at 1
0
blhsing On

An LRU cache is simply a mapping that preserves insertion order so the most recently used item can be moved to the front and the least recently used item can be removed when the maximum size of the cache is reached. In both Python 2 and Python 3, such a data structure can be found in the standard library as collections.OrderedDict with the popitem(last=False) method.

To more easily initialize an LRU cache for each column name, you can also use collections.defaultdict:

from collections import defaultdict, OrderedDict

class Client:
    def __init__(self, cache_size=3): # a cache size of only 3 for demo purposes
        self.col_cache = defaultdict(OrderedDict)
        self.cache_size = cache_size

    def get_col(self, col_name, time):
        cache = self.col_cache[col_name]
        try:
            cache[time] = cache.pop(time)
        except KeyError:
            if len(cache) >= self.cache_size:
                cache.popitem(last=False)
            cache[time] = '%s at %s' % (col_name, time) # make DB query here
        print(cache)
        return cache[time]

so that:

c = Client()
print(c.get_col('foo', 1))
print(c.get_col('foo', 2))
print(c.get_col('foo', 3))
print(c.get_col('foo', 4))
print(c.get_col('foo', 3))
print(c.get_col('foo', 1))

outputs:

OrderedDict([(1, 'foo at 1')])
foo at 1
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2')])
foo at 2
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2'), (3, 'foo at 3')])
foo at 3
OrderedDict([(2, 'foo at 2'), (3, 'foo at 3'), (4, 'foo at 4')])
foo at 4
OrderedDict([(2, 'foo at 2'), (4, 'foo at 4'), (3, 'foo at 3')])
foo at 3
OrderedDict([(4, 'foo at 4'), (3, 'foo at 3'), (1, 'foo at 1')])
foo at 1

Demo: Try it online!

Note that since Python 3.2, collections.OrderedDict also has a move_to_end method for you to move an item to the front, so you can change the line:

cache[time] = cache.pop(time)

to simply:

cache.move_to_end(time)
0
Dmitry On

We may preserve the login of an existing implementation of lru_cache but organise a mapping of chosen argument values to separate caches within an outer decorator. Here is a sample implementation:

from functools import lru_cache, wraps
from inspect import getcallargs


def lru_agrument_cache(argument_name):
    def decorator(function):
        callable_buckets = {}

        @wraps(function)
        def wrapper(*args, **kwargs):
            inspection = getcallargs(function, *args, **kwargs)
            
            # should use functools._make_key() for more general use
            bucket_key = inspection[argument_name]

            if bucket_key in callable_buckets: 
                return callable_buckets[bucket_key](*args, **kwargs)
            
            callable_buckets[bucket_key] = lru_cache(function)

            return callable_buckets[bucket_key](*args, **kwargs)
        
        # just to demonstrate usage
        def cache_info(argument_value):
            try:
                return callable_buckets[argument_value].cache_info()
            except KeyError:
                return None

        wrapper.cache_info = cache_info
        return wrapper

    return decorator

Usage:

@lru_agrument_cache('key')
def my_callable(key, times):
    return key * times

Verify:

my_callable('A', 2)
Out[16]: 'AA'
my_callable('A', 2)
Out[17]: 'AA'
my_callable('A', 3)
Out[18]: 'AAA'
my_callable('B', 2)
Out[19]: 'BB'

my_callable.cache_info('A')
Out[20]: CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
my_callable.cache_info('B')
Out[21]: CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)

  • inspect is also available in Python 2.*;
  • functools.lru_cache is to replace with that lru_cache currently is use;
  • functools.wraps and it's dependency functools.partial if not available, may than be taken from sources.
  • not sure what to add concerning thread safety though;