sorting - How to implement sort in hadoop? -


my problem sorting values in file. keys , values integers , need maintain keys of sorted values.

key   value 1     24 3     4 4     12 5     23 

output:

1     24 5     23 4     12 3     4 

i working massive data , must run code in cluster of hadoop machines. how can mapreduce?

you can (i'm assuming using java here)

from maps emit -

context.write(24,1); context.write(4,3); context.write(12,4) context.write(23,5) 

so, values needs sorted should key in mapreduce job. hadoop default sorts ascending order of key.

hence, either sort in descending order,

job.setsortcomparatorclass(longwritable.decreasingcomparator.class); 

or, this,

you need set custom descending sort comparator, goes in job.

public static class descendingkeycomparator extends writablecomparator {     protected descendingkeycomparator() {         super(text.class, true);     }      @suppresswarnings("rawtypes")     @override     public int compare(writablecomparable w1, writablecomparable w2) {         longwritable key1 = (longwritable) w1;         longwritable key2 = (longwritable) w2;                   return -1 * key1.compareto(key2);     } } 

the suffle , sort phase in hadoop take care of sorting keys in descending order 24,4,12,23

after comment:

if require descending intwritable comparable, can create 1 , use -

job.setsortcomparatorclass(descendingintcomparable.class); 

in case if using jobconf, use set

jobconfobject.setoutputkeycomparatorclass(descendingintcomparable.class); 

put following code below main() function -

public static void main(string[] args) {     int exitcode = toolrunner.run(new yourdriver(), args);     system.exit(exitcode); }  //this class defined outside of main not inside public static class descendingintwritablecomparable extends intwritable {     /** decreasing comparator optimized intwritable. */      public static class decreasingcomparator extends comparator {         public int compare(writablecomparable a, writablecomparable b) {             return -super.compare(a, b);         }         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {             return -super.compare(b1, s1, l1, b2, s2, l2);         }     } } 

Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

Function that returns a formatted array in VBA -