I'm doing this basic dp (Dynamic Programming) problem on trees (https://cses.fi/problemset/task/1674/). Given the structure of a company (hierarchy is a tree), the task is to calculate for each employee the number of their subordinates.
This:
import sys
from functools import lru_cache # noqa
sys.setrecursionlimit(2 * 10 ** 9)
if __name__ == "__main__":
n: int = 200000
boss: list[int] = list(range(1, 200001))
# so in my example it will be a tree with every parent having one child
graph: list[list[int]] = [[] for _ in range(n)]
for i in range(n-1):
graph[boss[i] - 1].append(i+1) # directed so neighbours of a node are only its children
@lru_cache(None)
def dfs(v: int) -> int:
if len(graph[v]) == 0:
return 0
else:
s: int = 0
for u in graph[v]:
s += dfs(u) + 1
return s
dfs(0)
print(*(dfs(i) for i in range(n)))
crashes (I googled the error message and it means stack overflow)
Process finished with exit code -1073741571 (0xC00000FD)
HOWEVER
import sys
sys.setrecursionlimit(2 * 10 ** 9)
if __name__ == "__main__":
n: int = 200000
boss: list[int] = list(range(1, 200001))
# so in my example it will be a tree with every parent having one child
graph: list[list[int]] = [[] for _ in range(n)]
for i in range(n-1):
graph[boss[i] - 1].append(i+1) # directed so neighbours of a node are only its children
dp: list[int] = [0 for _ in range(n)]
def dfs(v: int) -> None:
if len(graph[v]) == 0:
dp[v] = 0
else:
for u in graph[v]:
dfs(u)
dp[v] += dp[u] + 1
dfs(0)
print(*dp)
doesn't and it's exactly the same complexity right? The dfs goes exactly as deep in both situations too? I tried to make the two pieces of code as similar as I could.
I tried 20000000 instead of 200000 (i.e. graph 100 times deeper) and it still doesn't stackoverflow for the second option. Obviously I could do an iterative version of it but I'm trying to understand the underlying reason why there are such a big difference between those two recursive options so that I can learn more about Python and its underlying functionning.
I'm using Python 3.11.1.
lru_cacheis implemented in C, its calls are interleaved with your function's calls, and your C code recursion is too deep and crashes. Your second program only has deep Python code recursion, not deep C code recursion, avoiding the issue.In Python 3.11 I get a similar bad crash:
In Python 3.12 I just get an error:
That's despite your
sys.setrecursionlimit(2 * 10 ** 9).What’s New In Python 3.12 explains:
So in 3.11, your huge limit is applied to C recursion as well, Python obediently attempts your deep recursion, its C stack overflows, and the program crashes. Whereas in 3.12 the limit doesn't apply to C recursion, Python protects itself with that different mechanism at a relatively shallow recursion depth, producing that error instead.
Let's avoid that C recursion. If I use a (simplified) Python version of
lru_cache, your first program works fine in both 3.11 and 3.12 without any other change:See CPython's GitHub Issue 3.12 setrecursionlimit is ignored in connection with @functools.cache for more details and the current progress on this. There are efforts to increase the C recursion limit, but it looks like it'll remain only in the thousands. Not enough for your super deep recursion.
Miscellaneous further information I might expand on later:
ulimit -s sizeInKB.If you want the Python version of the regular
lru_cache, you could try the hack of importing and deleting the C version of its wrapper before you import the Python version (Technique from this answer, which did that withheapq):Or you could remove the import that replaces the Python implementation with the C implementation. But that's an even worse hack and I wouldn't do that. Maybe I would copy&rename the
functoolsmodule and modify the copy. But I mostly mention these hacks to illustrate what's happening, or as a last resort.