How can I get distcp failed files and replay the task?

609 Views Asked by At

I have distcp a file between two hdfs cluster with same version,when I execute failed ,I want to find the failed mapreduce task and related file path,then replay.

1

There are 1 best solutions below

0
Matt Andruff On BEST ANSWER

Copying 'retrying' actually already happens exactly (mapred.map.max.attempts times).

If you rerun distcp again, it will only try to copy files that haven't already been copied. (files successfully copied by a previous distcp on a re-execution will be marked as "skipped".)

If you would like a log of the files that couldn't be copied you can specify '-i' and -log <logdir>. This will ignore failures but write out a more complete log of what failed and why they failed.