How to subtract in Map Reduce paradigm

343 Views Asked by Vikram Garg At 25 October 2017 at 15:40

I have the following dataset

s1, s2, count
1, 2, x1
1, 3, x2
1, 4, x3
2, 1, y1
2, 3, y2
2, 4, y3
3, 1, z1
3, 2, z2

I want to get the following output

s1, s2, count
1, 2, x1-y1
1, 3, x2-z1
1, 4, x3
2, 3, y2-z2
2, 4, y3

The idea is that s1 is an entity which is favored over s2. And I have tuples such that s1 (say = 1) was favored over s2 (say = 2) by x1 times AND s1 (say = 2) was favored over s2 (say = 1) by y1 times. What I need is a sub O(n^2) algorithm to compute absolute number of times s1 was favored over s2 (or other way round). (x1-y1)

The problem is that there are 230 million such tuples and I can't have O(n^2) algorithm to compute this.

One observation is that the tuples are sorted on s1 since they are results of another MR output.

Please help me find a better solution.

Original Q&A

There are 1 best solutions below

OneCricketeer On 25 October 2017 at 21:33

I'm not sure I understand the "favoring." It looks as though you want to subtract values where the s1, s2 values are the same.

You can define a custom Comparable / Writable, let's call it S1S2Writable, that encapsulates (s1, s2) as a tuple and states the two tuples are equal when

((tuple1.s1 == tuple2.s1 && tuple2.s1 == tuple2.s2) ||
    (tuple1.s1 == tuple2.s2 && tuple2.s2 == tuple2.s1))

With that, you can define a process using Mapper<S1S2Writable, IntWritable, S1S2Writable, IntWritable> to read your input file, and pass it to a Reducer<S1S2Writable, IntWritable, KEYOUT, IntWritable>.

This will group the S1S2Writable with Iterable<IntWritable>, which you can perform a subtraction over.

How to subtract in Map Reduce paradigm

There are 1 best solutions below

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in DISTRIBUTED-COMPUTING

Related Questions in EMR

Related Questions in ELASTIC-MAP-REDUCE

Trending Questions

Popular # Hahtags

Popular Questions