-
-
Save idris75/d12aeb0f12d258ec5e21bfaafab8f188 to your computer and use it in GitHub Desktop.
Demystifying reduce and reduceByKey
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val lrdd =sc.parallelize( List(1,2,3,4,5,3,5)) | |
lrdd.reduce((a,b) => a+b) | |
//short version of the syntax below | |
lrdd.reduce( _+_) | |
// try changing reduce to reduceByKey, which will fail | |
// as reduceByKey is applicable to key-value pairs | |
lrdd.reduceByKey( _+_) | |
//convert/map into key-value pair and try reduceByKey | |
lrdd.map(x => (x,1)).reduceByKey( _+_).collect | |
//try reduce on key-value pair, which will fail | |
lrdd.map(x => (x,1)).reduce( _+_).collect | |
BTW reduceByKey is a Transformation and reduce is an Action | |
//try countByKey | |
lrdd.map(x => (x,1)).countByKey | |
//countByKey and reduceByKey the same? no, reduceByKey more powerful | |
lrdd.map(x => (x,1)).reduceByKey( _*_).collect | |
//sort by key asc/desc | |
lrdd.map(x => (x,1)).sortByKey(false).collect | |
lrdd.map(x => (x,1)).sortByKey(true).collect |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment