How to run MapReduce script through Hortonworks Sandbox in Python?

215 Views Asked by At

I have Hortonworks Sandbox and ran the command:

ssh [email protected] -p 2222;

After logging in, I would like to run MapReduce on 2 HDFS files RatinsBreakdown.py and u.data located under Documents like I did here:

python RatingsBreakdown.py -r hadoop hdfs:///user/[username]/u.data --hadoop-streaming-jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming.jar

How can I adjust the command above to run through the Hadoop cluster?

[root@sandbox ~]#
1

There are 1 best solutions below

0
OneCricketeer On

If RatingsBreakdown.py is a mrjob process, then that command you've shown does everything you want. You can open the YARN UI to verify the process ran in the cluster.

Otherwise, the documentation on Hadoop Streaming should point you at the correct location