No Output for MapReduce Program even after successful job completion on Cloudera VM

37 Views Asked by At

Programming Environment and Brief Overview:

I am working on one of my Big Data Assignments that involved finding the Strike Rate of Gamers using Hadoop Mapreduce 2.6.0 version. I am supposed to work in Cloudera's Eclipse using Java. The Java version in this VM is 1.7.0. The JSON jar file used is jackson 2.2.3 version. Though the Map Reduce job runs successfully, the output file appears empty. And the readings of the TASK show some trouble in reading the file and the mapper output.

The Assignment Problem Statement Introduction:

Problem Statement Strike Rate is the average runs a batsman scores in 100 balls. Given the input, find the final strike rate of each batsman.

Mapper Input : Array of JSON Objects Example :

[ { "name": "xyz", "runs": 100, "balls": 100 }, { "name": "xyz", "runs": 10, "balls": 10 }, { "name": "abc", "runs": 30, "balls": 10 }, { "name": "abc", "runs": 20, "balls": 5 }, { "name": "abc", "runs": 10, "balls": 42 } ] Output : • Output must be the name,local_strike_rate • local_strike_rate refers to the strike rate of the particular match Strike Rate formula = (runs/balls) * 100 [rounded upto 3 decimal places].

Reducer Input : Same format as Mapper output Output : Independent JSON Objects • output of the reducer has the following keys : name of batsman, and average strike rate accross all matches average strike rate = sum of all local strike rates / total matches [rounded upto 3 decimal places] Example :

{ "name": "xyz", "strike_rate": 100 } { "name": "abc", "strike_rate": 241.27 }

My Code Implementation:

Referenced Libraries /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.12.0.jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar /usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.6.0-mr1-cdh5.12.0.jar /usr/lib/hadoop-mapreduce/jackson-core-2.2.3.jar /usr/lib/hadoop-mapreduce/jackson-databind-2.2.3.jar

SRDriver.java

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class SRDriver {
    
    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("Usage: Driver <input path> <output path>");
            System.exit(-1);
        }
        
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Strike Rate Calculation"); // Compatibility with Java 1.7.0
        job.setJarByClass(SRDriver.class);
        job.setMapperClass(SRMapper.class);
        job.setReducerClass(SRReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

SRMapper.java

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

public class SRMapper extends Mapper<LongWritable, Text, Text, Text> {
    private final ObjectMapper mapper = new ObjectMapper();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        try {
            JsonNode rootNode = mapper.readTree(value.toString());
            
            for (JsonNode node : rootNode) {
                String name = node.get("name").asText();
                int runs = node.get("runs").asInt();
                int balls = node.get("balls").asInt();
                double strikeRate = ((double) runs / balls) * 100;
                
                context.write(new Text(name), new Text(String.valueOf(strikeRate)));
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

SRReducer.java

import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class SRReducer extends Reducer<Text, Text, Text, Text> {
    public void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {
        double totalStrikeRate = 0;
        int totalMatches = 0;

        for (Text value : values) {
            totalStrikeRate += Double.parseDouble(value.toString());
            totalMatches++;
        }

        double averageStrikeRate = totalStrikeRate / totalMatches;

        context.write(key, new Text(String.format("%.3f", averageStrikeRate)));
    }
}

I havent worked on the output formatting part yet but this is the intended output.

{"name": "Deepti", "strike_rate": 61.551} 
{"name": "Harmanpreet", "strike_rate": 87.124} 
{"name": "Ishan", "strike_rate": 85.077} 
{"name": "Jemimah", "strike_rate": 77.407} 
{"name": "Renuka", "strike_rate": 74.35} 
{"name": "Rohit", "strike_rate": 71.464} 
{"name": "Shubman", "strike_rate": 66.041} 
{"name": "Smriti", "strike_rate": 57.807} 
{"name": "VVS Laxman", "strike_rate": 64.078} 
{"name": "Virat", "strike_rate": 89.928}

Though the Map Reduce job runs successfully, the output file appears empty. And the readings of each JOB File Counters, Mapper and Reducer show some trouble in reading the file and the mapper output.

Here are the logs.

[cloudera@quickstart workspace]$ hadoop fs -ls
Found 5 items
drwxr-xr-x   - cloudera cloudera          0 2024-02-13 07:38 StrikeRateOutput
-rw-r--r--   1 cloudera cloudera         28 2024-02-01 02:17 WCFile.txt
drwxr-xr-x   - cloudera cloudera          0 2024-02-01 02:18 WprdCountOutput
-rw-r--r--   1 cloudera cloudera          0 2023-12-28 02:37 dir1
-rw-r--r--   1 cloudera cloudera       5262 2024-02-09 18:08 sample_data.json
[cloudera@quickstart workspace]$ hadoop fs -ls StrikeRateOutput
Found 2 items
-rw-r--r--   1 cloudera cloudera          0 2024-02-13 07:38 StrikeRateOutput/_SUCCESS
-rw-r--r--   1 cloudera cloudera          0 2024-02-13 07:38 StrikeRateOutput/part-r-00000
[cloudera@quickstart workspace]$ hadoop fs -ls WprdCountOutput
Found 2 items
-rw-r--r--   1 cloudera cloudera          0 2024-02-01 02:18 WprdCountOutput/_SUCCESS
-rw-r--r--   1 cloudera cloudera         26 2024-02-01 02:18 WprdCountOutput/part-00000
[cloudera@quickstart workspace]$ hadoop jar StrikeRate.jar SRDriver sample_data.json StrikeRateOutput
24/02/13 08:33:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
24/02/13 08:33:44 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
24/02/13 08:33:44 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1281)
    at java.lang.Thread.join(Thread.java:1355)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
 24/02/13 08:33:45 INFO input.FileInputFormat: Total input paths to process : 1
 24/02/13 08:33:45 WARN hdfs.DFSClient: Caught exception 
 java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1281)
    at java.lang.Thread.join(Thread.java:1355)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
 24/02/13 08:33:45 WARN hdfs.DFSClient: Caught exception 
 java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1281)
    at java.lang.Thread.join(Thread.java:1355)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:952)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:690)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:879)
 24/02/13 08:33:45 INFO mapreduce.JobSubmitter: number of splits:1
 24/02/13 08:33:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1706766150543_0016
 24/02/13 08:33:45 INFO impl.YarnClientImpl: Submitted application application_1706766150543_0016
 24/02/13 08:33:45 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1706766150543_0016/
 24/02/13 08:33:45 INFO mapreduce.Job: Running job: job_1706766150543_0016
 24/02/13 08:33:52 INFO mapreduce.Job: Job job_1706766150543_0016 running in uber mode : false
 24/02/13 08:33:52 INFO mapreduce.Job:  map 0% reduce 0%
 24/02/13 08:33:58 INFO mapreduce.Job:  map 100% reduce 0%
 24/02/13 08:34:05 INFO mapreduce.Job:  map 100% reduce 100%
 24/02/13 08:34:05 INFO mapreduce.Job: Job job_1706766150543_0016 completed successfully
 24/02/13 08:34:05 INFO mapreduce.Job: Counters: 49
    File System Counters
 FILE: Number of bytes read=6
 FILE: Number of bytes written=249691
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=5389
 HDFS: Number of bytes written=0
 HDFS: Number of read operations=6
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=2
    Job Counters 
 Launched map tasks=1
 Launched reduce tasks=1
 Data-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=3679
 Total time spent by all reduces in occupied slots (ms)=4738
 Total time spent by all map tasks (ms)=3679
 Total time spent by all reduce tasks (ms)=4738
 Total vcore-milliseconds taken by all map tasks=3679
 Total vcore-milliseconds taken by all reduce tasks=4738
 Total megabyte-milliseconds taken by all map tasks=3767296
 Total megabyte-milliseconds taken by all reduce tasks=4851712
    Map-Reduce Framework
 Map input records=103
 Map output records=0
 Map output bytes=0
 Map output materialized bytes=6
 Input split bytes=127
 Combine input records=0
 Combine output records=0
 Reduce input groups=0
 Reduce shuffle bytes=6
 Reduce input records=0
 Reduce output records=0
 Spilled Records=0
 Shuffled Maps =1
 Failed Shuffles=0
 Merged Map outputs=1
 GC time elapsed (ms)=123
 CPU time spent (ms)=1000
 Physical memory (bytes) snapshot=352821248
 Virtual memory (bytes) snapshot=3015106560
 Total committed heap usage (bytes)=226365440
    Shuffle Errors
 BAD_ID=0
 CONNECTION=0
 IO_ERROR=0
 WRONG_LENGTH=0
 WRONG_MAP=0
 WRONG_REDUCE=0
    File Input Format Counters 
 Bytes Read=5262
    File Output Format Counters 
 Bytes Written=0

Despite several attempts at reviewing my code I miserably failed. Any help would be hugely appreciated, I am a student, relatively new to Hadoop.

Thank you very much for your time.

0

There are 0 best solutions below