Writable Classes in mapreduce

Question

Writable Classes in mapreduce

707 Views Asked by codegeek21 At 09 November 2020 at 23:46

How can i use the values from hashset (the docid and offset) to the reduce writable so as to connect map writable with reduce writable? The mapper (LineIndexMapper) works fine but in the reducer (LineIndexReducer) i get the error that it can't get string as argument when i type this: context.write(key, new IndexRecordWritable("some string"); although i have the public String toString() in the ReduceWritable too.
I believe the hashset in reducer's writable (IndexRecordWritable.java) maybe isn't taking the values correctly? I have the below code.

IndexMapRecordWritable.java
    
    

    
        import java.io.DataInput;
        import java.io.DataOutput;
        import java.io.IOException;
        import org.apache.hadoop.io.LongWritable;
        import org.apache.hadoop.io.Text;
        import org.apache.hadoop.io.Writable;
    
        public class IndexMapRecordWritable implements Writable {
    
            private LongWritable offset;
            private Text docid;
    
            public LongWritable getOffsetWritable() {
                return offset;
            }
    
            public Text getDocidWritable() {
                return docid;
            }
    
            public long getOffset() {
                return offset.get();
            }
    
            public String getDocid() {
                return docid.toString();
            }
    
            public IndexMapRecordWritable() {
                this.offset = new LongWritable();
                this.docid = new Text();
            }
          
            public IndexMapRecordWritable(long offset, String docid) {
                this.offset = new LongWritable(offset);
                this.docid = new Text(docid);
            }
            public IndexMapRecordWritable(IndexMapRecordWritable indexMapRecordWritable) {
                this.offset = indexMapRecordWritable.getOffsetWritable();
                this.docid = indexMapRecordWritable.getDocidWritable();
            }
            @Override
            public String toString() {
    
                StringBuilder output = new StringBuilder()
                output.append(docid);
                output.append(offset);
                
                return output.toString();
    
            }
    
            @Override
            public void write(DataOutput out) throws IOException {
 

            }
    
            @Override
            public void readFields(DataInput in) throws IOException {


            }
    
        }
    
    
    



    
    IndexRecordWritable.java
    
    

    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import java.util.HashSet;
    import org.apache.hadoop.io.Writable;
    
    public class IndexRecordWritable implements Writable {
    
        // Save each index record from maps
        private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
    
        public IndexRecordWritable() {
        }
    
        public IndexRecordWritable(
                Iterable<IndexMapRecordWritable> indexMapRecordWritables) {
  
        }
    
        @Override
        public String toString() {
    
            StringBuilder output = new StringBuilder();

            return output.toString();
    
        }
    
        @Override
        public void write(DataOutput out) throws IOException {

        }
   
        @Override
        public void readFields(DataInput in) throws IOException {

        }
    
    }

Original Q&A

There are 1 best solutions below

**Prateek** · Answer 1 · 2020-11-10T02:34:37.590000

Alright, here is my answer based on a few assumptions. The final output is a text file containing the key and the file names separated by a comma based on the information in the reducer class's comments on the pre-condition and post-condition.

In this case, you really don't need IndexRecordWritable class. You can simply write to your context using

context.write(key, new Text(valueBuilder.substring(0, valueBuilder.length() - 1)));

with the class declaration line as

public class LineIndexReducer extends Reducer<Text, IndexMapRecordWritable, Text, Text>

Don't forget to set the correct output class in the driver.

That must serve the purpose according to the post-condition in your reducer class. But, if you really want to write a Text-IndexRecordWritable pair to your context, there are two ways approach it -

with string as an argument (based on your attempt passing a string when you IndexRecordWritable class constructor is not designed to accept strings) and
with HashSet as an argument (based on the HashSet initialised in IndexRecordWritable class).

Since your constructor of IndexRecordWritable class is not designed to accept String as an input, you cannot pass a string. Hence the error you are getting that you can't use string as an argument. Ps: if you want your constructor to accept Strings, you must have another constructor in your IndexRecordWritable class as below:

// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
    
    // to save the string
    private String value;

    public IndexRecordWritable() {
    }

    public IndexRecordWritable(
            HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
        /***/
    }

    // to accpet string
    public IndexRecordWritable (String value)   {
        this.value = value;
    }

but that won't be valid if you want to use the HashSet. So, approach #1 can't be used. You can't pass a string.

That leaves us with approach #2. Passing a HashSet as an argument since you want to make use of the HashSet. In this case, you must create a HashSet in your reducer before passing it as an argument to IndexRecordWritable in context.write.

To do this, your reducer must look like this.

@Override
    protected void reduce(Text key, Iterable<IndexMapRecordWritable> values, Context context) throws IOException, InterruptedException {
        //StringBuilder valueBuilder = new StringBuilder();

        HashSet<IndexMapRecordWritable> set = new HashSet<>();

        for (IndexMapRecordWritable val : values) {
            set.add(val);
            //valueBuilder.append(val);
            //valueBuilder.append(",");
        }

        //write the key and the adjusted value (removing the last comma)
        //context.write(key, new IndexRecordWritable(valueBuilder.substring(0, valueBuilder.length() - 1)));
        context.write(key, new IndexRecordWritable(set));
        //valueBuilder.setLength(0);
    }

and your IndexRecordWritable.java must have this.

// Save each index record from maps
    private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();

// to save the string
//private String value;

public IndexRecordWritable() {
}

public IndexRecordWritable(
        HashSet<IndexMapRecordWritable> indexMapRecordWritables) {
    /***/
    tokens.addAll(indexMapRecordWritables);
}

Remember, this is not the requirement according to the description of your reducer where it says.

POST-CONDITION: emit the output a single key-value where all the file names are separated by a comma ",".  <"marcello", "a.txt@3345,b.txt@344,c.txt@785">

If you still choose to emit (Text, IndexRecordWritable), remember to process the HashSet in IndexRecordWritable to get it in the desired format.

Writable Classes in mapreduce

There are 1 best solutions below

Related Questions in CLASS

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in KEY-VALUE

Related Questions in WRITABLE

Trending Questions

Popular # Hahtags

Popular Questions