I have some map/reduce code where I believe the shuffle phase in the mapper (which is performed implicitly by Hadoop) will be shorter as compared to a previously existing approach trying to solve the same problem. Is there any way I could get the exact shuffle time taken by the framework?
I'm using Hadoop 3.2.2, and I found an API for the TaskAttempt object which does seem to give the elapsedShuffleTime value for every map or reduce task. However, I would prefer if I didn't have to use the REST APIs and could instead output this from the code itself. Is there any way to do that?
Also, from the first given example on the above link for the Task AttemptAPI, in the JSON response example, the task type is reduce, but there is a non-zero value for the elapsedShuffleTime parameter. How is that valid, since the shuffle phase should logically be completed before the reduce task starts? (i.e. map -> shuffle -> reduce)