Python function to get value (str or object) in deep nested dictionary

38 Views Asked by At

I'm sharing a working function that is able to return a value from a dictionary disregarding the depth of the nested object. For example: my_dict = {"People": [{"name": "Alice"}, {"name": "Bob"}]} If I wanted to get the value of "People" id write: my_dict["People"] (path length is 1) And to get "Bob": my_dict["People"][1] (path length is 2) **notice the difference between retrieving a value in a dict vs list.

The problem arises when you don't know ahead of time the depth (path length) of the value. I've searched for a function that does this but failed to find one. I'm posting this to share the function for anyone else trying to achieve this and is having trouble.

The following function solves this problem: Input:

  • data: dictionary to search in
  • path: path to the object in list format (e.g. Bob's path will be ["People", 1] **this way the depth doesn't matter and can be determined at runtime.
def get_nested_object(data: dict, path: list):
    # Get outermost key from path
    key = path[0]
    # if data is type dictionary
    if isinstance(data, dict):
        # if key is found
        if key in data:
            # if path length == 1 -> this is the object we need to return
            if len(path) == 1:
                return {key: data[key]}
            # else try to find value in current key
            else:
                return get_nested_object(data[key], path[1:])
        # else if data is type list
    elif isinstance(data, list):
        # if path length == 1 -> this is the object we need to return
        if len(path) == 1:
            return data[key]
        else:
            return get_nested_object(data[key], path[1:])
    else:
        return "Invalid data"

** note that this code does not have data validation and error handling

1

There are 1 best solutions below

1
Hai Vu On

First, your get_nested_object function is overly complicated, so I offer a simpler version without any error safeguarding. Next is the function find_all, which given a target value, it will find all the (path, value). The find_first function is built on top of find_all. Finally, find_all is built on top of iter_values, which given a nested data structure, generates all the paths and data.

import json


def get_nested_object(data, path):
    """Give a path, return the value"""
    for key in path:
        data = data[key]
    return data


def iter_values(data, path=None):
    """Returns all paths, values"""
    path = path or []
    if isinstance(data, (int, str, float, bool)):
        yield path, data
    elif isinstance(data, (dict)):
        for key, value in data.items():
            yield from iter_values(value, path + [key])
    elif isinstance(data, list):
        for key, value in enumerate(data):
            yield from iter_values(value, path + [key])
    else:
        raise TypeError(f"Cannot handle object {data}")


def find_all(data, target):
    """Given a target, find all paths"""
    for path, value in iter_values(data):
        if value == target:
            yield path, value


def find_first(data, target, default=None):
    """Given a target, first find path"""
    found = next(find_all(data, target), None)
    if found is None:
        return [], default
    return found


def main():
    """Entry"""
    data = {
        "People": [{"name": "Alice"}, {"name": "Bob"}],
        "Animal": {
            "Cats": [
                {"name": "Tristan"},
                {"name": "Bob"},
                {"name": "Alice"},
            ],
            "Dogs": [
                {"name": "Bob"},
                {"name": "Scooby"},
            ],
        },
    }
    print("\n# Data:")
    print(json.dumps(data, indent=4))

    print("\n# get_nested_object:")
    path = ["Animal", "Dogs", 1, "name"]
    value = get_nested_object(data, path)
    print(f"get_nested_object(data, {path}) -> {value!r}")

    print("\n# iter_values:")
    for path, value in iter_values(data):
        print(f"{path=}, {value=}")

    print("\n# find_all:")
    for path, value in find_all(data, "Bob"):
        print(f"{path=}, {value=}")

    print("\n# find_first:")
    print(f"Find first Bob: {find_first(data, 'Bob')}")
    print(f"Find first Anna: {find_first(data, 'Anna', 'Anna not found')}")


if __name__ == "__main__":
    main()

Output:

# Data:
{
    "People": [
        {
            "name": "Alice"
        },
        {
            "name": "Bob"
        }
    ],
    "Animal": {
        "Cats": [
            {
                "name": "Tristan"
            },
            {
                "name": "Bob"
            },
            {
                "name": "Alice"
            }
        ],
        "Dogs": [
            {
                "name": "Bob"
            },
            {
                "name": "Scooby"
            }
        ]
    }
}

# get_nested_object:
get_nested_object(data, ['Animal', 'Dogs', 1, 'name']) -> 'Scooby'

# iter_values:
path=['People', 0, 'name'], value='Alice'
path=['People', 1, 'name'], value='Bob'
path=['Animal', 'Cats', 0, 'name'], value='Tristan'
path=['Animal', 'Cats', 1, 'name'], value='Bob'
path=['Animal', 'Cats', 2, 'name'], value='Alice'
path=['Animal', 'Dogs', 0, 'name'], value='Bob'
path=['Animal', 'Dogs', 1, 'name'], value='Scooby'

# find_all:
path=['People', 1, 'name'], value='Bob'
path=['Animal', 'Cats', 1, 'name'], value='Bob'
path=['Animal', 'Dogs', 0, 'name'], value='Bob'

# find_first:
Find first Bob: (['People', 1, 'name'], 'Bob')
Find first Anna: ([], 'Anna not found')

Update

As discussed, I did not place any error checking into:

def get_nested_object(data, path):
    """Give a path, return the value"""
    for key in path:
        data = data[key]
    return data

The caller then can catch any exception which might arise:

try:
    obj = get_nested_object(data, path)
except (KeyError, IndexError) as error:
    print("ERROR:", error)

An alternative is to redesign the function get_nested_object to return a default value in case of error:

def get_nested_object(data, path, default=None):
    """Give a path, return the value"""
    try:
        for key in path:
            data = data[key]
        return data
    except (KeyError, IndexError) as error:
        return default

Then, the usages would be:

if (obj := get_nested_object(data, path)) is None:
    # Handle error
else:
    print(obj)

# or...
if (obj := get_nested_object(data, path, "NOTFOUND")) == "NOTFOUND":
    # Handle error
else:
    print(obj)

Of the two implementations, I still like the original implementation for two reasons

  1. It is simple
  2. If the caller does not catch the KeyError and IndexError, then the script/app will crash and we know that we have a problem. This type of error is easier to detect than the silent/non-crash errors.