Hadoop distcp does not skip CRC checks

260 Views Asked by At

I have an issue with skipping CRC checks between source and target paths running distcp. I copy and decrypt files on demand and their checksum is different, that is expected.

My command looks like following:

hadoop distcp -skipcrccheck -update -direct sftp://path s3a://path

When hadoop distcp starts, it prints configs and there is skipCRC=true

But job fails with error:

  • Mismatch in length of source:sftp://path (95066273) and target:s3a://path/.distcp.tmp.attempt_1675828993400_0012_m_000001_1 (95065888)

hadoop version - Hadoop 3.2.1-amzn-5

Have anyone had a luck with skipping CRC checks?

I updated EMR to 6.9.0 with hadoop 3.3.3 what was supposed to help based on this Jira. but it didn't and job still fails on CRC validation.

0

There are 0 best solutions below