sorting - How to implement sort in hadoop? -
my problem sorting values in file. keys , values integers , need maintain keys of sorted values.
key value 1 24 3 4 4 12 5 23
output:
1 24 5 23 4 12 3 4
i working massive data , must run code in cluster of hadoop machines. how can mapreduce?
you can (i'm assuming using java here)
from maps emit -
context.write(24,1); context.write(4,3); context.write(12,4) context.write(23,5)
so, values needs sorted should key in mapreduce job. hadoop default sorts ascending order of key.
hence, either sort in descending order,
job.setsortcomparatorclass(longwritable.decreasingcomparator.class);
or, this,
you need set custom descending sort comparator, goes in job.
public static class descendingkeycomparator extends writablecomparator { protected descendingkeycomparator() { super(text.class, true); } @suppresswarnings("rawtypes") @override public int compare(writablecomparable w1, writablecomparable w2) { longwritable key1 = (longwritable) w1; longwritable key2 = (longwritable) w2; return -1 * key1.compareto(key2); } }
the suffle , sort phase in hadoop take care of sorting keys in descending order 24,4,12,23
after comment:
if require descending intwritable comparable, can create 1 , use -
job.setsortcomparatorclass(descendingintcomparable.class);
in case if using jobconf, use set
jobconfobject.setoutputkeycomparatorclass(descendingintcomparable.class);
put following code below main()
function -
public static void main(string[] args) { int exitcode = toolrunner.run(new yourdriver(), args); system.exit(exitcode); } //this class defined outside of main not inside public static class descendingintwritablecomparable extends intwritable { /** decreasing comparator optimized intwritable. */ public static class decreasingcomparator extends comparator { public int compare(writablecomparable a, writablecomparable b) { return -super.compare(a, b); } public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { return -super.compare(b1, s1, l1, b2, s2, l2); } } }
Comments
Post a Comment