How to run MapReduce script through Hortonworks Sandbox in Python?

215 Views Asked by drkshdw At 13 September 2021 at 02:17

I have Hortonworks Sandbox and ran the command:

ssh [email protected] -p 2222;

After logging in, I would like to run MapReduce on 2 HDFS files RatinsBreakdown.py and u.data located under Documents like I did here:

python RatingsBreakdown.py -r hadoop hdfs:///user/[username]/u.data --hadoop-streaming-jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming.jar

How can I adjust the command above to run through the Hadoop cluster?

[root@sandbox ~]#

Original Q&A

There are 1 best solutions below

OneCricketeer On 06 October 2021 at 02:56

If RatingsBreakdown.py is a mrjob process, then that command you've shown does everything you want. You can open the YARN UI to verify the process ran in the cluster.

Otherwise, the documentation on Hadoop Streaming should point you at the correct location

How to run MapReduce script through Hortonworks Sandbox in Python?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in HORTONWORKS-SANDBOX

Trending Questions

Popular # Hahtags

Popular Questions