Memory is growing to OOM when inserting huge data to mongo

187 Views Asked by At

I am loading 500k entries from a csv file to mongo with @feathers/mongodb.

In a hook, I am collecting all lines, manipulating them a little and insert the whole array with

let data_to_insert = [];

let element = function () {
    this.member1 = '';
    this.member2 = '';
    this.member3 = '';
    this.member4 = '';
    this.member5 = '';
    this.member6 = '';
};

// Read File
let content = await csv({
    delimiter: ';',
})
    .fromFile(pathCsv) // 120MB file, 500k entries
    .subscribe((line) => {
        let t_elem = new element();
        t_elem.member1 = roundMinutes(line.member1); //returns date
        t_elem.member2 = line.member2;
        t_elem.member3 = line.member3;
        t_elem.member4 = line.member4;
        t_elem.member5 = new Date(
            +line.member5 * 1000,
        ); // JS: timestamp * 1000
        t_elem.member6 = new Date(
            +line.member6 * 1000,
        );

        data_to_insert.push(t_elem);
    });

// store the list
context.app.service('api/myservice').create(data_to_insert);
// all entries are written in the db. Heap grows afterwards

return;

to mongo. It works perfectly and the data is written in about 10 seconds to the database.

However, I noticed with pm2 that the heap is growing up to 8GB and then the process runs into an out of memory. I am wondering why this happens afterwards. Can this be related to 500k create events which gets triggered?

Snapshot with chrome devtools: enter image description here

Especially the Strings are huge. If I check them I can see some "event strings" which eats most of the memory: enter image description here

If I try to call the hook again, it gets stuck. I (or PM2) has to restart the process to get it up and running again:

PM2          | [PM2][WORKER] Process 0 restarted because it exceeds --max-memory-restart value (current_memory=8904929280 max_memory_limit=8589934592 [octets])
PM2          | Process 0 in a stopped status, starting it
PM2          | Stopping app:Collector id:0
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | pid=53621 msg=failed to kill - retrying in 100ms
PM2          | App [Collector:0] exited with code [0] via signal [SIGINT]
PM2          | pid=53621 msg=process killed
PM2          | App [Collector:0] starting in -fork mode-

If this is related to the events, how to deactivate this for an insert of this kind? If not, what should be optimised in this code to circumvent this bloat in heap?

Thanks for your help.

1

There are 1 best solutions below

0
christoph On

Thanks to @bwgjoseph on http://feathersjs.slack.com

Two possibilities:

  1. https://docs.feathersjs.com/api/hooks.html#context-event

Setting context.event = null to skip only publishing hooks. This will then affect all calls in this context/hook you're running in.

  1. https://docs.feathersjs.com/guides/migrating.html#hook-less-service-methods

Using methods like _create() instead of create() will skip all hooks which would normally run. This has the advantage, that you can decide which calls shall be published (huge call with _, other calls without), but the drawback, that no hook at all is run if you rely on this.