How to intern objects with weakref/gc support?

66 Views Asked by At

I'm trying to implement interning of non-string objects in python. For strings we have the sys.intern function. However it doesn't support other immutable objects. To put it out of the way, I'm aware of the problems that may arise when the interned objects are modified.

A commonly cited way of interning for non-strings is:

S = {}
x = S.setdefault(x, x)

However such a dict holds strong references to the interned objects, and therefore will grow indefinitely, which is a problem in my application.

I figured out that there's a WeakSet type that would auto-collect unreferenced elements. However there doesn't seem to be a fast way to get the held object in the set:

S = WeakSet()
S.add(x)    # returns nothing

# a very slow way to work around it:
for y in S:
    if y == x:
        x = y

There are also WeakKeyDictionary and WeakValueDictionary, but there doesn't seem to be a WeakKeyValueDictionary.

So how can this be achieved in python?

1

There are 1 best solutions below

6
blhsing On

You can emulate a WeakKeyValueDictionary by creating a weak reference to the object as a key and then storing the object as a default value of a WeakValueDictionary:

from weakref import ref, WeakValueDictionary

def intern(obj, weak_refs=WeakValueDictionary()):
    return weak_refs.setdefault(ref(obj), obj)

so that:

import gc

a = frozenset(['x'*10000])
b = frozenset(['x'*10000])
assert a is not b
a = intern(a)
b = intern(b)
assert a is b
print(len(intern.__defaults__[0]))
del a
del b
gc.collect()
print(len(intern.__defaults__[0]))

passes both assertions and outputs:

1
0

Demo: https://replit.com/@blhsing/UsableSpecificProcessors