Subclassing dict and implementing getitem for nested class (python)

Question

Subclassing dict and implementing getitem for nested class (python)

470 Views Asked by Xian At 31 March 2023 at 12:51

I am currently trying to implement a subclass (called 'Pointer_dict') for the python dictionary class dict. The purpose of this class is to save memory when creating copies of dictionaries and changing their values without changing the original dicitonary. Which is why, I don't use deepcopy().

The Pointer_dict class is supposed to take an existing dictionary and upon call (e.g. tmp['foo']) return the original value if it exists (otherwise KeyError), but return its own value, if that value was overwritten before.

For demonstration:

original = { 'foo': 1, 'bar': 2 }
pointer_dict = Pointer_dict(origin=original)

print(pointer_dict['foo']) # 1
pointer_dict['foo'] = 10
print(pointer_dict['foo']) # 10
print(original['foo']) # 1

So it points at its own values, when they were set, but points towards the original dictionary, when it was not set. Problem with this appears for nested dictionaries. Assume we have:

original = { 'foo': 1, 'bar': { 'foobar': 2, 'barfoo': 3 } }
pointer_dict = Pointer_dict(origin=original)

print(pointer_dict['foo']) # 1
pointer_dict['bar']['foobar'] = 10
print(pointer_dict['bar']['foobar']) # 10
print(original['bar']['foobar']) # 2

Since we only set ['bar']['foobar'] the pointer_dict only contains { 'bar': { 'foobar': 10} }, because it is not supposed to copy the original values (due to memory saving), but just point at them. So the problem I'm facing now, if I want to print either pointer_dict itself or pointer_dict['bar'] I want the output to look like:

{ 'foo': 1, 'bar': { 'foobar': 10, 'barfoo': 3 } } # print(pointer_dict)
{ 'foobar': 10, 'barfoo': 3 } # print(pointer_dict['bar'])

So my questions is, how can I implement the __getitem__ and the __repr__ method with the return value being a mixture of its own values and original values?

If I use a normal dictionary as a return-value, then nested lists won't work, because after one __getitem__ call, the return value is a dictionary and then my own __getitem__ method won't be called for deeply nested lists, so it'd return the wrong values.

My approach so far was like this (I have not yet tried implementing __repr__, because my previous method didn't work):

class Pointer_Dict(dict):
    def __init__(self, mapping={}, /, **kwargs):
        self.origin = kwargs['kwargs']
        super().__init__(mapping)

    def __setitem__(self, key, value) -> None:
        if isinstance(value, dict):
            value = Pointer_Dict(value, kwargs=self.origin[key])
        super().__setitem__(key, value)
    
    def __getitem__(self, key):
        try:
            val_dict = super().__getitem__(key)
            try:
                val_orig = self.origin[key]
                if isinstance(val_orig, dict):
                    return Pointer_Dict(mapping=val_dict, kwargs=val_orig)
            except KeyError:
                return val_dict
        except KeyError:
            try:
                val_orig = self.origin[key]
                if isinstance(val_orig, dict):
                    return Pointer_Dict(kwargs=val_orig)
                return val_orig
            except KeyError:
                raise KeyError(key)
        return val_dict

My thoughts here were to make an recursive approach, where it would call the __getitem__ as long as the value is still a dictionary/subclass of a dictionary and returns the value only when faced with something that is not a dictionary. ... But that did not work, I do not get the right values and I'm a little lost on how to solve the problem...

So I'm open for any new suggestions on how to solve this! Thank you in advance!

Original Q&A

There are 1 best solutions below

**jsbueno** · Answer 1 · 2023-03-31T14:02:40.413000

Sorry- I won´t have the time (or will) to fix all your code. You got a good starting point.

As I commented above, this kind of code is error prone, and there are lots of edge cases. Some of which you overlook, some of others are due to intricacies in the way Python do things and you might have just not known about them.

So, first things first: collections.ChainMap does what you describe. But it never fiddles with the dictionary contents: it is not recursive - if a nested value is a plain dictionary, it will just be a plain dictionary.

Second thing: one should avoid inherit directly from dict. It has some traps - primarily because the internal implementation will not always go through user defined __setitem__ and __getitem__ in the user class, as native methods just access the dicionary contents directly (i.e. __iter__, get, setdefault and even __len__ methods won't work and you'd have to write all of those, and maybe some more). The workaround that is to inherit from collections.UserDict instead of inheriting directly from dict: it is a thin pure Python wrapper around native dicts, but that will channel all data access through __getitem__ and __setitem__: so you can get away with only those. Also, it uses a native plain dict in its .data attribute, and you can access that in your methods directly if you think you should.

Third thing: This may be the main reason why your code is not behaving as you expect: you always check if an object is a dict with isinstance(obj, dict) :this will also be true if your object is a Pointer_Dict instance, since it subclasses dict. Changing to inherit from UserDict would fix this statements, but not your logic: there are points there where you will maybe want to perform a different action if the object being tested is already a Pointer_Dict - so you probably want to test for that first. And then test if the other object is an instance of collections.abc.Mapping instead of dict directly, as this will make your code work with other flavours of dictionaries as well.

Fourth thing: you are wrapping your values with Pointr_Divcts both when setting them, and again when reading them (creating new objetcs all the time. If the objective would be saving memory, that is not the best approach). Most of the problems that cause will be resolved if you properly test if the values are already Pointer_Dict, like I spotted in "Third thing" above. But you will get different results if you wrap then when setting and retrieving, or just when retrieving, and you have to check what you really want. Anothe thing: dictionaries already existing in the original wrapped dict would not be converted to Pointer_Dict in your implementation, as the Python's dict contructors won't call its own __setitem__. If you do inherit from collections.UserDict that will probably be fixed.

Fith thing: this is not related to your logic, but it is a common "Python trap" - in the line def __init__(self, mapping={}, /, **kwargs):, the default value {} for mapping is a trap, as it is a single object that will be shared across all your instances of Pointer_Dict. The correct way to do this is set the default to None and create an empty dict inside the method: this new dict will be created each time the code is executed, and all instances will have independent values:

...
def __init__(self, mapping={}, /, **kwargs):
    if mapping is None:
         mapping = {}
    ...
...

(Also, the way you are using kwargs is strange: your __init__ needs to have a named parameter called kwargs as it is - check if you really need that)

So, I spotted those on a first read. There may be more problems, and some of these are not simple fixes - they depend on the final behavior you want of your code.

I am the author of mapping classes that have similar features, and I can asure you that before getting to "production quality" this kind of logic needs a lot of testing. (But I don't have a "nested chainmap" like the one you want here, otherwise I'd just tell you to use my implementation). Mine are published in pypi as "extradict", and you can pip-install it - the source code for those may get you some extra insights - https://github.com/jsbueno/extradict .

oh - sorry - just now I perceived I pointed problems in your implementation, but did not even touch in the new behaviors you are asking for. Sorry for that - but if you can get these fixes in, them getting your desired behaviors with plain if...else code in your methods should become a lot easier, as you won't face unexpected results anymore.

Best wishes there.

Subclassing dict and implementing getitem for nested class (python)

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in DICTIONARY

Related Questions in NESTED

Related Questions in SUBCLASSING

Trending Questions

Popular # Hahtags

Popular Questions

Subclassing dict and implementing __getitem__ for nested class (python)

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in DICTIONARY

Related Questions in NESTED

Related Questions in SUBCLASSING

Trending Questions

Popular # Hahtags

Popular Questions

Subclassing dict and implementing getitem for nested class (python)