Salut community,
I had a problem with leaking memory in some code of mine and posted a question on this board (Issue with python memory management). Through some inspection I found what I had to change in my code to stop memory from leaking, but I still do not understand the fundamentals of it.
A small example for my problem looks like this:
import gc
from dataclasses import dataclass
import psutil
class SomeClass:
def __init__(self, *args):
self.args = args
@staticmethod
def from_str(some_str):
# do some stuff with some_str
return SomeClass(*SomeClass.disassemble(some_str).to_tuple())
@staticmethod
def disassemble(some_str):
@dataclass
class StringAttributes:
attr1: str
def to_tuple(self):
return (v for v in self.__dict__.values())
return StringAttributes(some_str[0])
def some_function(strings: set[str]) -> set[SomeClass]:
return {SomeClass.from_str(s) for s in strings}
counter = 0
process = psutil.Process()
while True:
a = some_function({str(counter)})
print(
f"\n"
f""
f"Date processed: {counter}\n"
f"Memory consumption:\n"
f"Resident memory (MB): {process.memory_info().rss / 1024 ** 2}\n"
f"Virtual memory (MB): {process.memory_info().vms / 1024 ** 2}\n"
f"Object count: {len(gc.get_objects())}"
)
gc.collect()
counter += 1
Running this script leads to small increments in the virtual memory consumption at irregular iteration intervals, e.g.
Date processed: 1552
Memory consumption:
Resident memory (MB): 15.375
Virtual memory (MB): 231.78515625
Object count: 13488
Date processed: 1553
Memory consumption:
Resident memory (MB): 15.75
Virtual memory (MB): 231.984375
Object count: 13488
The count of objects tracked by the garbage collector however stays constant at every iteration step.
When I change the code to move the StringAttributes dataclass outside of the disassemble method however the memory consumption is constant throughout all iterations as far as I observed it.
@dataclass
class StringAttributes:
attr1: str
def to_tuple(self):
return (v for v in self.__dict__.values())
class SomeClass:
def __init__(self, *args):
self.args = args
@staticmethod
def from_str(some_str):
# do some stuff with some_str
return SomeClass(*SomeClass.disassemble(some_str).to_tuple())
@staticmethod
def disassemble(some_str):
return StringAttributes(some_str[0])
def some_function(strings: set[str]) -> set[SomeClass]:
return {SomeClass.from_str(s) for s in strings}
Now this change in the code layout solves my problem, but it leaves me puzzled:
- Why does this solve my problem, i.e. what exactly is staying in the process' memory when I define the dataclass in a nested way? I guess the problem is that I return an instance of the dataclass from the disassemble method and thus move it outside of the scope where it is defined.
- Why does the increase in virtual memory not reflect in the length of the list of objects tracked by the garbage collector or in the list of unreachable objects (which I take to be gc.garbage)?