Data Fusion for xml-to-json transformation: "+ExitOnOutOfMemoryError" and "exited with a non-zero exit code 3. Error file: prelaunch.err"

297 Views Asked by Mauro Di Pasquale At 19 March 2023 at 04:50

When transforming an xml file to json, the Data Fusion pipeline, configured in Autoscaling mode up to 84 cores, stops indicating an error.

Can anybody help me to make it work?

The 100-pages Raw log file seems indicating that possible errors were:

+ExitOnOutOfMemoryError
Container exited with a non-zero exit code 3. Error file: prelaunch.err

It happened with the following configuration:

The weird thing is that the very same pipeline, with an xml file 10-times smaller, of only 141MB, worked correctly:

xml_141MB_file

Can anybody help me in understanding why the Cloud Data Fusion pipeline, set in Autoscaling mode up to 84 cores, succeeds with the 141MB xml file and it fails with a 1.4GB xml file?

For clarity, following all the detailed steps:

my github

Original Q&A

There are 1 best solutions below

Fernando Velasquez On 21 March 2023 at 19:59

Parsing a 1GB xml file requires a significant amount of memory in your workers.

Looking at your pipeline JSON, your pipeline is currently configured to allocate 2GB of ram per worker.

"config": {
    "resources": {
        "memoryMB": 2048,
        "virtualCores": 1
    },
    "driverResources": {
        "memoryMB": 2048,
        "virtualCores": 1
    },
    ...
}

This is likely insufficient to hold the entire parsed ~1.1GB json payload.

Try increasing the amount of executor memory in the Config -> Resources -> Executor section. I would suggest trying with 8 GB of ram for your example.

EDIT: When using the Default or Autoscaling compute profile, CDF will create workers with 2 vCPU cores and 8 GB of Ram. You will need to increase this value using the following runtime arguments:

system.profile.properties.workerCPUs = 4 
system.profile.properties.workerMemoryMB = 22528

This will increase the worker size to 4 vCPU and 22GB of RAM, which will be large enough to fit the requested executor in the worker.

Data Fusion for xml-to-json transformation: "+ExitOnOutOfMemoryError" and "exited with a non-zero exit code 3. Error file: prelaunch.err"

There are 1 best solutions below

Related Questions in PIPELINE

Related Questions in GOOGLE-CLOUD-DATA-FUSION

Related Questions in FUSION

Related Questions in XML-TO-JSON

Related Questions in WRANGLER

Trending Questions

Popular # Hahtags

Popular Questions