I have Hortonworks Sandbox and ran the command:
ssh [email protected] -p 2222;
After logging in, I would like to run MapReduce on 2 HDFS files RatinsBreakdown.py and u.data located under Documents like I did here:
python RatingsBreakdown.py -r hadoop hdfs:///user/[username]/u.data --hadoop-streaming-jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming.jar
How can I adjust the command above to run through the Hadoop cluster?
[root@sandbox ~]#
If
RatingsBreakdown.pyis amrjobprocess, then that command you've shown does everything you want. You can open the YARN UI to verify the process ran in the cluster.Otherwise, the documentation on Hadoop Streaming should point you at the correct location