I am currently trying to implement a subclass (called 'Pointer_dict') for the python dictionary class dict. The purpose of this class is to save memory when creating copies of dictionaries and changing their values without changing the original dicitonary. Which is why, I don't use deepcopy().
The Pointer_dict class is supposed to take an existing dictionary and upon call (e.g. tmp['foo']) return the original value if it exists (otherwise KeyError), but return its own value, if that value was overwritten before.
For demonstration:
original = { 'foo': 1, 'bar': 2 }
pointer_dict = Pointer_dict(origin=original)
print(pointer_dict['foo']) # 1
pointer_dict['foo'] = 10
print(pointer_dict['foo']) # 10
print(original['foo']) # 1
So it points at its own values, when they were set, but points towards the original dictionary, when it was not set. Problem with this appears for nested dictionaries. Assume we have:
original = { 'foo': 1, 'bar': { 'foobar': 2, 'barfoo': 3 } }
pointer_dict = Pointer_dict(origin=original)
print(pointer_dict['foo']) # 1
pointer_dict['bar']['foobar'] = 10
print(pointer_dict['bar']['foobar']) # 10
print(original['bar']['foobar']) # 2
Since we only set ['bar']['foobar'] the pointer_dict only contains { 'bar': { 'foobar': 10} }, because it is not supposed to copy the original values (due to memory saving), but just point at them.
So the problem I'm facing now, if I want to print either pointer_dict itself or pointer_dict['bar'] I want the output to look like:
{ 'foo': 1, 'bar': { 'foobar': 10, 'barfoo': 3 } } # print(pointer_dict)
{ 'foobar': 10, 'barfoo': 3 } # print(pointer_dict['bar'])
So my questions is, how can I implement the __getitem__ and the __repr__ method with the return value being a mixture of its own values and original values?
If I use a normal dictionary as a return-value, then nested lists won't work, because after one __getitem__ call, the return value is a dictionary and then my own __getitem__ method won't be called for deeply nested lists, so it'd return the wrong values.
My approach so far was like this (I have not yet tried implementing __repr__, because my previous method didn't work):
class Pointer_Dict(dict):
def __init__(self, mapping={}, /, **kwargs):
self.origin = kwargs['kwargs']
super().__init__(mapping)
def __setitem__(self, key, value) -> None:
if isinstance(value, dict):
value = Pointer_Dict(value, kwargs=self.origin[key])
super().__setitem__(key, value)
def __getitem__(self, key):
try:
val_dict = super().__getitem__(key)
try:
val_orig = self.origin[key]
if isinstance(val_orig, dict):
return Pointer_Dict(mapping=val_dict, kwargs=val_orig)
except KeyError:
return val_dict
except KeyError:
try:
val_orig = self.origin[key]
if isinstance(val_orig, dict):
return Pointer_Dict(kwargs=val_orig)
return val_orig
except KeyError:
raise KeyError(key)
return val_dict
My thoughts here were to make an recursive approach, where it would call the __getitem__ as long as the value is still a dictionary/subclass of a dictionary and returns the value only when faced with something that is not a dictionary.
... But that did not work, I do not get the right values and I'm a little lost on how to solve the problem...
So I'm open for any new suggestions on how to solve this! Thank you in advance!
Sorry- I won´t have the time (or will) to fix all your code. You got a good starting point.
As I commented above, this kind of code is error prone, and there are lots of edge cases. Some of which you overlook, some of others are due to intricacies in the way Python do things and you might have just not known about them.
So, first things first:
collections.ChainMapdoes what you describe. But it never fiddles with the dictionary contents: it is not recursive - if a nested value is a plain dictionary, it will just be a plain dictionary.Second thing: one should avoid inherit directly from
dict. It has some traps - primarily because the internal implementation will not always go through user defined__setitem__and__getitem__in the user class, as native methods just access the dicionary contents directly (i.e.__iter__,get,setdefaultand even__len__methods won't work and you'd have to write all of those, and maybe some more). The workaround that is to inherit fromcollections.UserDictinstead of inheriting directly fromdict: it is a thin pure Python wrapper around native dicts, but that will channel all data access through__getitem__and__setitem__: so you can get away with only those. Also, it uses a native plaindictin its.dataattribute, and you can access that in your methods directly if you think you should.Third thing: This may be the main reason why your code is not behaving as you expect: you always check if an object is a dict with
isinstance(obj, dict):this will also be true if your object is aPointer_Dictinstance, since it subclasses dict. Changing to inherit fromUserDictwould fix this statements, but not your logic: there are points there where you will maybe want to perform a different action if the object being tested is already aPointer_Dict- so you probably want to test for that first. And then test if the other object is an instance ofcollections.abc.Mappinginstead ofdictdirectly, as this will make your code work with other flavours of dictionaries as well.Fourth thing: you are wrapping your values with Pointr_Divcts both when setting them, and again when reading them (creating new objetcs all the time. If the objective would be saving memory, that is not the best approach). Most of the problems that cause will be resolved if you properly test if the values are already
Pointer_Dict, like I spotted in "Third thing" above. But you will get different results if you wrap then when setting and retrieving, or just when retrieving, and you have to check what you really want. Anothe thing: dictionaries already existing in the original wrappeddictwould not be converted toPointer_Dictin your implementation, as the Python's dict contructors won't call its own__setitem__. If you do inherit fromcollections.UserDictthat will probably be fixed.Fith thing: this is not related to your logic, but it is a common "Python trap" - in the line
def __init__(self, mapping={}, /, **kwargs):, the default value{}for mapping is a trap, as it is a single object that will be shared across all your instances ofPointer_Dict. The correct way to do this is set the default toNoneand create an empty dict inside the method: this new dict will be created each time the code is executed, and all instances will have independent values:(Also, the way you are using
kwargsis strange: your__init__needs to have a named parameter calledkwargsas it is - check if you really need that)So, I spotted those on a first read. There may be more problems, and some of these are not simple fixes - they depend on the final behavior you want of your code.
I am the author of mapping classes that have similar features, and I can asure you that before getting to "production quality" this kind of logic needs a lot of testing. (But I don't have a "nested chainmap" like the one you want here, otherwise I'd just tell you to use my implementation). Mine are published in pypi as "extradict", and you can pip-install it - the source code for those may get you some extra insights - https://github.com/jsbueno/extradict .
oh - sorry - just now I perceived I pointed problems in your implementation, but did not even touch in the new behaviors you are asking for. Sorry for that - but if you can get these fixes in, them getting your desired behaviors with plain
if...elsecode in your methods should become a lot easier, as you won't face unexpected results anymore.Best wishes there.