Keeping multiple dictionaries in sync when one is the super set of the rest

78 Views Asked by At

I want to have a class with dictionaries that contain nodes, but nodes can be of several sorts: points, vertices,... (and possible others) I want to efficiently loop and access over either one of these (over all nodes, just over points, just over vertices...).

The class would therefore be:

class A:
   _nodes: dict
   _points: dict
   _vertices: dict

Bu surely, if I modify points, it will not modify nodes and vice-versa.

My first idea was to use data descriptors to keep them in sync, but this didn't really work (or at least I don't know how to implement this). Is it possible?

I can just implement nodes a dict of dicts and implement vertices and points as properties.

class A:
   _nodes = dict{"points": dict(), "vertices": dict()}
   @property
   def _points(self):
        return(self._nodes["points"])
   @property
   def _vertices(self):
        return(self._nodes["vertices"])

But I'm thinking that it might be time consuming to compute the hash and have additional dictionary lookups every time I will access either points or vertices.

And also, ideally, I wanted to have a public cached property view for nodes, vertices and points (and updating these using data desciptors)

class NodeDescriptor():
    def __set__(self, obj, value):
        od = obj.__dict__
        od["_nodes"] = value
        if "nodes" in od: del od["nodes"]

 class PointDescriptor(): ...
 class VertexDescriptor(): ...

class A:
   _nodes = NodeDesciptor()
   _points = PointDesciptor()
   _vertices = VertexDesciptor()

   @cached_property
   def nodes(self): return NodeView(self._nodes)
   @cached_property
   def points(self): return NodeView(self._points)
   @cached_property
   def vertices(self): return NodeView(self._vertices)

But here the idea to keep nodes as a dict of dict does not work, since we cannot call the descriptor whenever we change nodes["vertex"] = new_dict()

2

There are 2 best solutions below

6
Samwise On

Avoid unnecessary duplication, and implement one set of properties in terms of the other. I might make points and vertices the source of truth and then have nodes just return an iterator that includes both of them:

from dataclasses import dataclass
from itertools import chain
from typing import Iterator

Point = tuple[float, float]
Vertex = tuple[Point, Point]

@dataclass
class A:
    points: dict[str, Point]
    vertices: dict[str, Vertex]

    @property
    def nodes(self) -> Iterator[Point|Vertex]:
        return chain(self.points.values(), self.vertices.values())
0
jsbueno On

I think a custom dictionary that would keep the meta-information on a node type alongside the value could fit you.

That will allow for O(1) retrieval and updating - and O(m + n) (m, n being the amount of Nodes of either type) for any action that would need iteration over the types - but the code would be much simpler.

It is possible to inherit from UserDict, and then an extra custom class that will bind the node types for retrieving and set data in the dictionary:

from collections import UserDict
from collections.abc import MutableMapping
from weakref import WeakKeyDictionary, proxy


class NodeType:
    def __init__(self, type_):
        self.type = type_
    
    def __set_name__(self, owner, name):
        self.name = name
        
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return ProxyDict(instance, self.type)


class ProxyDict(MutableMapping):
    instance_cache = WeakKeyDictionary()
    def __new__(cls, instance, type_):
        container = cls.instance_cache.setdefault(instance, {})
        if type_ in container:
            return container[type_]
        self = super().__new__(cls)
        self.instance = proxy(instance)
        self.type = type_
        container[type_] = self
        return self

    def __init__(self, *args):
        # just implemented to prevent an error at object.__init__, as
        # initalization is done in __new__
        pass
    
    def __getitem__(self, key):
        return self.instance.__getitem__(key, self.type)
    def __setitem__(self, key, value):
        return self.instance.__setitem__(key, value)
    def __delitem__(self, key):
        return self.instance.__delitem__(key, self.type)
    def __iter__(self):
        yield from (key for key, (t, v) in self.instance._items() if t==self.type)
    def __len__(self):
        # Ok  - len will be slow. 
        return sum(1 for _ in iter(self))
    def __repr__(self):
        return repr(dict(self))
    
class A(UserDict):
    points = NodeType("Point")
    vertices = NodeType("Vertex")
    _counter_id = 0
    
    def __init__(self, *args, **kwargs):
        self._id = self.__class__._counter_id
        self.__class__._counter_id += 1
        super().__init__(*args, **kwargs)
    
    def __getitem__(self, key, type_=None):
        it_type, item = super().__getitem__(key)
        if type_ in (it_type, None):
            return item
        raise KeyError(key)
    
    def __setitem__(self, key, value, type_=None):
        if type_ is None:
            # if the class name is "Vertex
            type_ = type(value).__name__
        super().__setitem__(key, (type_, value))
 
    def __delitem__(self, key, type_=None):
        it_type, item = super.__getitem__(key)
        if type_ in (it_type, None):
            super().__delitem__(key)
        raise KeyError(key)
    def _items(self):
        # same as .items(), but also return the type of each value
        for key in self:
            yield (key, super().__getitem__(key))
    def __hash__(self):
        return self._id
    

    

I think from there you can fine tune what you want for __repr__ in each case, and what to do on name clashes, and such. I made a somewhat complex cache system with weakrefs, but it might not even be necessary, with a new proxy-dict being instantiated at each access to a.vertices, etc... it would not be bad: it would not be much worse than what Python does for method access in an instance.

Here is the code above in use in the interactive mode:


In [137]: class Vertex: pass

In [138]: class Point: pass

In [139]: a = A()

In [140]: a["bla"] = Vertex()

In [141]: a["ble"] = Point()

In [142]: a.vertices["bla"]
Out[142]: <__main__.Vertex at 0x7f6dc6dbbb90>

In [143]: a.vertices["ble"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[...]
KeyError: 'ble'

In [144]: a.points["ble"]
Out[144]: <__main__.Point at 0x7f6dcc91a110>