I have an issue with skipping CRC checks between source and target paths running distcp. I copy and decrypt files on demand and their checksum is different, that is expected.
My command looks like following:
hadoop distcp -skipcrccheck -update -direct sftp://path s3a://path
When hadoop distcp starts, it prints configs and there is skipCRC=true
But job fails with error:
- Mismatch in length of source:sftp://path (95066273) and target:s3a://path/.distcp.tmp.attempt_1675828993400_0012_m_000001_1 (95065888)
hadoop version - Hadoop 3.2.1-amzn-5
Have anyone had a luck with skipping CRC checks?
I updated EMR to 6.9.0 with hadoop 3.3.3 what was supposed to help based on this Jira. but it didn't and job still fails on CRC validation.