dsbulk unload failing due to cassandra pod restart. cassandra pod shows it's got killed due to OOM issue.
2024-03-26 21:54:14 INFO Operation directory: /cassandra_data/dsbulk/dsbulk-1.11.0/bin/logs/LOAD_20240326-215414-519088 2024-03-26 21:54:16 ERROR Operation LOAD_20240326-215414-519088 failed: Java.io.IOException: Error creating CSV parser for file:/cassandra_data/dsbulk/dsbulk.csv. Caused by: Error creating CSV parser for file:/cassandra_data/dsbulk/dsbulk.csv. Caused by: File not found: /cassandra_data/dsbulk/dsbulk.csv (No such file or directory). reactor.core.Exceptions$ReactiveException: java.io.IOException: Error creating CSV parser for file:/cassandra_data/dsbulk/dsbulk.csv at com.datastax.oss.dsbulk.workflow.load.LoadWorkflow.execute(LoadWorkflow.java:242) [3 skipped] com.datastax.oss.dsbulk.io.CompressedIOUtils.newBufferedReader(CompressedIOUtils.java:96) 2024-03-26 21:54:18 INFO Final stats:
We're using the below command to export data. export DSBULK_JAVA_OPTS="-Xmx10G" dsbulk unload -url /cassandra_data/dsbulk/export2.csv -delim "|" --executor.continuousPaging.enabled false -cl LOCAL_QUORUM --driver.basic.request.timeout="30 minutes" --datastax-java-driver.basic.request.timeout="30 minutes" -maxErrors 1000000 --schema.splits=12C -maxConcurrentQueries 1 -maxConcurrentFiles 10 -header true -k -t
The table having 1.6TB. is there any best options/approach to export huge cassandra table.