When transforming an xml file to json, the Data Fusion pipeline, configured in Autoscaling mode up to 84 cores, stops indicating an error.
Can anybody help me to make it work?
The 100-pages Raw log file seems indicating that possible errors were:
+ExitOnOutOfMemoryErrorContainer exited with a non-zero exit code 3. Error file: prelaunch.err
It happened with the following configuration:
The weird thing is that the very same pipeline, with an xml file 10-times smaller, of only 141MB, worked correctly:
Can anybody help me in understanding why the Cloud Data Fusion pipeline, set in Autoscaling mode up to 84 cores, succeeds with the 141MB xml file and it fails with a 1.4GB xml file?
For clarity, following all the detailed steps:
Parsing a 1GB xml file requires a significant amount of memory in your workers.
Looking at your pipeline JSON, your pipeline is currently configured to allocate 2GB of ram per worker.
This is likely insufficient to hold the entire parsed ~1.1GB json payload.
Try increasing the amount of executor memory in the Config -> Resources -> Executor section. I would suggest trying with 8 GB of ram for your example.
EDIT: When using the Default or Autoscaling compute profile, CDF will create workers with 2 vCPU cores and 8 GB of Ram. You will need to increase this value using the following runtime arguments:
This will increase the worker size to 4 vCPU and 22GB of RAM, which will be large enough to fit the requested executor in the worker.