Hadoop MapReducee WordCountLength - Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable

13 Views Asked by At

I was trying to create a MapReduce application to for WordLengthCount as the below code

public class WordLengthCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, IntWritable, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private IntWritable wordLength = new IntWritable();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        wordLength.set(itr.nextToken().length());
        context.write(wordLength, one);
      }
    }
  }

public static class IntSumReducer
       extends Reducer<IntWritable,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();
    private Text wordLength = new Text();

    public void reduce(IntWritable key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        result.set(sum);
        
        int keyValue = key.get();
        if (keyValue == 1) {
            wordLength.set("Tiny");
        } else if (keyValue >= 2 && keyValue <= 4) {
            wordLength.set("Small");
        } else if (keyValue >= 5 && keyValue <= 9) {
            wordLength.set("Medium");
        } else {
            wordLength.set("Big");
        }
        context.write(wordLength, result);
    }
}

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word length count");
    job.setJarByClass(WordLengthCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

The log return the error

Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable

I do not understand because I have checked all the datatype for input and output in both Mapper and Reducer to make sure they are match.

Can someone please help me to solve this problem. Thank you a lot.

1

There are 1 best solutions below

0
Ben Watson On

I like to get my MR jobs running without the combiner and then add that in once everything else is working as expected.

In your case the combiner is outputting <Text, IntWritable> as output, but then the reducer expects <IntWritable, IntWritable> as input.

Either remove the combiner or make sure that the input/output types are consistent.