Efficiently searching for custom objects in large Python lists

265 Views Asked by At

I have a list of custom Python objects and need to search for the existence of specific objects within that list. My concern is the performance implications of searching large lists, especially frequently.

Here's a simplified example using a custom Person class with attributes name and age:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

people = [Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35)]

Currently, I'm using a list comprehension and the any() function to check if a person with a specific name and age exists in the list:

if any(p.name == "Bob" and p.age == 25 for p in people):
    print("The person exists.")

Is there a more efficient way to search for the existence of specific custom objects within large Python lists?

5

There are 5 best solutions below

4
Jab On

I mean technically speaking if equality checks are set up for it, you could create the Person object you're looking for and just see if it's in the list.

people = [Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35)]

if Person("Bob", 25) in people:
    print("The person exists.")

This will only work if the Person object you use in your condition and the Person object in the list evaluate to the same object. So this solution may or may not work for you.

You could also stand to gain performance by using a set inistead of a list as stated in @RandomGuy's comment.

Edit:

If you want a more appropriate search you can use a set or a dict:

people_set = {(p.name, p.age) for p in people}
people_dict = {(p.name, p.age): p for p in people}

if ("Bob", 25) in people_set:
    print("Exists")

bob = people_dict[("Bob", 25)]
0
StevenLasagna On

if the fields you are going to be filtering on are low, then you could first create a hash set of those fields. then knowing if a person named "bob" exists is O(1) instead of O(n) in case of list iteration

1
robotiaga On

One way to improve the performance of searching for specific objects within a large Python list is to use a dictionary instead of a list. This approach involves creating a dictionary where the keys are the attributes that you want to search by, and the values are the objects themselves.

Here's an example of how this approach can be applied to your Person class:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

people = [Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35)]

people_dict = {}
for person in people:
    people_dict[(person.name, person.age)] = person

if ("Bob", 25) in people_dict:
    print("The person exists.")
0
SIGHUP On

If we implement a couple of dunder methods in your class we can then use a set as follows:

class Person:
    def __init__(self, name, age):
        self._name = name
        self._age = age
    def _key(self):
        return self._name, self._age
    def __hash__(self):
        return hash(self._key())
    def __eq__(self, other):
        return isinstance(other, type(self)) and self._key() == other._key()

_set = {Person('Andy', age) for age in range(18, 100)}

print(Person('Andy', 55) in _set)

Output:

True
0
Terry Spotts On

If your Person() class is immutable you can use a frozen dataclass and a set:

from dataclasses import dataclass


@dataclass(frozen=True)
class Person:
    name: str
    age: int


people = {Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35), Person("Mary", 47)}
print(people)
>>> {Person(name='Bob', age=25), Person(name='Charlie', age=35), Person(name='Alice', age=30), Person(name='Mary', age=47)}

print(f'{Person("Bob", 25) in people=}')
>>> Person("Bob", 25) in people=True

print(f'{Person("Mary", 44) in people=}')
>>> Person("Mary", 44) in people=False

But you will not be able to mutate them:

p = people.pop()
print(p)
>>> Person(name='Mary', age=47)

p.age = "99"
dataclasses.FrozenInstanceError: cannot assign to field 'age'