Out of memory in clojure - Nested reduce on Lazy Sequence

93 Views Asked by At

Something as simple as adding 3 nested reduce is giving me an out of memory error.

Execution error (OutOfMemoryError) at .../my-large-lazy$iter$fn$fn$iter$fn$fn$iter$fn$fn (serial_write.clj:39).
Java heap space: failed reallocation of scalar replaced objects

Issue happens when I evaluate (add-cords 1000) I am running with

java -Xmx1G ...
(defn my-large-lazy
  [size]
  (for [x (range size)]
    (for [y (range size)]
      (for [z (range size)]
        {:x x :y y :z z}))))

(defn add-coords 
  [size]
  (reduce (fn [sum-1 x-1]
            (reduce (fn [sum-2 x-2]
                      (reduce (fn [sum-3 x-3]
                                (+ sum-3 (:x x-3) (:y x-3) (:z x-3)))
                              sum-2 x-2))
                    sum-1 x-1))
          0 (my-large-lazy size)))

I expect it to compute the sum taking may be a few KB of memory, not error at 1GB. Ideally the memory requirement for this should be O(1). Larger the size larger the memory it takes. Avoiding dictionary in {:x x :y y :z z} helps, but in the original program it is unavoidable. I am looking at how to implement add-cords efficiently.

I know about holding head of sequence and other usual gotchas. But this seem to be evading me. I tried a profiler which measures allocations, but I can't figure out why they are not being deallocated. I tried a few variants of implementing this function, but all gets into same OOM issue. I also tried various JVMs, with minor differences in overall memory they all fail.

2

There are 2 best solutions below

1
erdos On

This is a very interesting question. It is true that the head is not retained of the sequence during reduce, but then what causes the OOM error?

Even though the head of the original lazy sequence is not retained during reducing over it, the current LazySeq object is kept in memory, because it holds the reference to the rest of the sequence. And since the current LazySeq instance is kept, its head element is also kept.

And since you have a nested structure of lazy sequences, everything referenced in the outmost current sequence element is also kept, even if those have already been processed, until the outmost current sequence is moved to the next element.

For your example, when we are processing x=9,y=9,z=9, then the outer lazy seq holding x=9 is still in memory, which means that x=9,y=0,z=0, x=9,y=0,z=1, ... x=9,y=9,z=9 are also there. And these are kept until we move to x=10 in the outermost sequence.


The solution is to reorganize the code a bit: intead of a nested structure of lazy sequences, flatten it out to a single lazy sequence:

(defn my-large-lazy' [size]
  (for [x (range size)
        y (range size)
        z (range size)]
    {:x x :y y :z z}))

(defn internal-sum [sum-3 x-3] (+ sum-3 (:x x-3) (:y x-3) (:z x-3)))

(println (reduce internal-sum 0 (my-large-lazy' 100)))
2
emptyjar On

I'd be hardpressed to find a more efficient algorithm than the closed form. This is a reduction based on the three-fold redundancy over the calculation and utilizing the integer range summation formula https://en.wikipedia.org/wiki/Summation

(defn add-coords 
    [size] 
    (/ (* 3 size size size (- size 1)) 2) 
)