How can I get distcp failed files and replay the task?

609 Views Asked by Allen Wod At 16 February 2022 at 02:49

I have distcp a file between two hdfs cluster with same version,when I execute failed ,I want to find the failed mapreduce task and related file path,then replay.

Original Q&A

There are 1 best solutions below

Matt Andruff On 18 February 2022 at 16:18 BEST ANSWER

Copying 'retrying' actually already happens exactly (mapred.map.max.attempts times).

If you rerun distcp again, it will only try to copy files that haven't already been copied. (files successfully copied by a previous distcp on a re-execution will be marked as "skipped".)

If you would like a log of the files that couldn't be copied you can specify '-i' and -log <logdir>. This will ignore failures but write out a more complete log of what failed and why they failed.

How can I get distcp failed files and replay the task?

There are 1 best solutions below

Related Questions in HADOOP

Related Questions in HDFS

Related Questions in DISTCP

Trending Questions

Popular # Hahtags

Popular Questions