How can i use the values from hashset (the docid and offset) to the reduce writable so as to connect map writable with reduce writable?
The mapper (LineIndexMapper) works fine but in the reducer (LineIndexReducer) i get the error that it can't get string as argument when i type this:
context.write(key, new IndexRecordWritable("some string");
although i have the public String toString() in the ReduceWritable too.
I believe the hashset in reducer's writable (IndexRecordWritable.java) maybe isn't taking the values correctly?
I have the below code.
IndexMapRecordWritable.java
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
public class IndexMapRecordWritable implements Writable {
private LongWritable offset;
private Text docid;
public LongWritable getOffsetWritable() {
return offset;
}
public Text getDocidWritable() {
return docid;
}
public long getOffset() {
return offset.get();
}
public String getDocid() {
return docid.toString();
}
public IndexMapRecordWritable() {
this.offset = new LongWritable();
this.docid = new Text();
}
public IndexMapRecordWritable(long offset, String docid) {
this.offset = new LongWritable(offset);
this.docid = new Text(docid);
}
public IndexMapRecordWritable(IndexMapRecordWritable indexMapRecordWritable) {
this.offset = indexMapRecordWritable.getOffsetWritable();
this.docid = indexMapRecordWritable.getDocidWritable();
}
@Override
public String toString() {
StringBuilder output = new StringBuilder()
output.append(docid);
output.append(offset);
return output.toString();
}
@Override
public void write(DataOutput out) throws IOException {
}
@Override
public void readFields(DataInput in) throws IOException {
}
}
IndexRecordWritable.java
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.HashSet;
import org.apache.hadoop.io.Writable;
public class IndexRecordWritable implements Writable {
// Save each index record from maps
private HashSet<IndexMapRecordWritable> tokens = new HashSet<IndexMapRecordWritable>();
public IndexRecordWritable() {
}
public IndexRecordWritable(
Iterable<IndexMapRecordWritable> indexMapRecordWritables) {
}
@Override
public String toString() {
StringBuilder output = new StringBuilder();
return output.toString();
}
@Override
public void write(DataOutput out) throws IOException {
}
@Override
public void readFields(DataInput in) throws IOException {
}
}
Alright, here is my answer based on a few assumptions. The final output is a text file containing the key and the file names separated by a comma based on the information in the reducer class's comments on the pre-condition and post-condition.
In this case, you really don't need IndexRecordWritable class. You can simply write to your context using
with the class declaration line as
Don't forget to set the correct output class in the driver.
That must serve the purpose according to the post-condition in your reducer class. But, if you really want to write a Text-IndexRecordWritable pair to your context, there are two ways approach it -
Since your constructor of IndexRecordWritable class is not designed to accept String as an input, you cannot pass a string. Hence the error you are getting that you can't use string as an argument. Ps: if you want your constructor to accept Strings, you must have another constructor in your IndexRecordWritable class as below:
but that won't be valid if you want to use the HashSet. So, approach #1 can't be used. You can't pass a string.
That leaves us with approach #2. Passing a HashSet as an argument since you want to make use of the HashSet. In this case, you must create a HashSet in your reducer before passing it as an argument to IndexRecordWritable in context.write.
To do this, your reducer must look like this.
and your IndexRecordWritable.java must have this.
Remember, this is not the requirement according to the description of your reducer where it says.
If you still choose to emit (Text, IndexRecordWritable), remember to process the HashSet in IndexRecordWritable to get it in the desired format.