sorting - How to implement sort in hadoop? -

my problem sorting values in file. keys , values integers , need maintain keys of sorted values.

key   value 1     24 3     4 4     12 5     23 


1     24 5     23 4     12 3     4 

i working massive data , must run code in cluster of hadoop machines. how can mapreduce?

you can (i'm assuming using java here)

from maps emit -

context.write(24,1); context.write(4,3); context.write(12,4) context.write(23,5) 

so, values needs sorted should key in mapreduce job. hadoop default sorts ascending order of key.

hence, either sort in descending order,


or, this,

you need set custom descending sort comparator, goes in job.

public static class descendingkeycomparator extends writablecomparator {     protected descendingkeycomparator() {         super(text.class, true);     }      @suppresswarnings("rawtypes")     @override     public int compare(writablecomparable w1, writablecomparable w2) {         longwritable key1 = (longwritable) w1;         longwritable key2 = (longwritable) w2;                   return -1 * key1.compareto(key2);     } } 

the suffle , sort phase in hadoop take care of sorting keys in descending order 24,4,12,23

after comment:

if require descending intwritable comparable, can create 1 , use -


in case if using jobconf, use set


put following code below main() function -

public static void main(string[] args) {     int exitcode = yourdriver(), args);     system.exit(exitcode); }  //this class defined outside of main not inside public static class descendingintwritablecomparable extends intwritable {     /** decreasing comparator optimized intwritable. */      public static class decreasingcomparator extends comparator {         public int compare(writablecomparable a, writablecomparable b) {             return, b);         }         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {             return, s1, l1, b2, s2, l2);         }     } } 


Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

Function that returns a formatted array in VBA -