Last active
March 2, 2020 13:37
-
-
Save jamesrajendran/b48de4e63df3fdfe942b9a42021beefd to your computer and use it in GitHub Desktop.
Demystifying reduce and reduceByKey
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val lrdd =sc.parallelize( List(1,2,3,4,5,3,5)) | |
lrdd.reduce((a,b) => a+b) | |
//short version of the syntax below | |
lrdd.reduce( _+_) | |
// try changing reduce to reduceByKey, which will fail | |
// as reduceByKey is applicable to key-value pairs | |
lrdd.reduceByKey( _+_) | |
//convert/map into key-value pair and try reduceByKey | |
lrdd.map(x => (x,1)).reduceByKey( _+_).collect | |
//try reduce on key-value pair, which will fail | |
lrdd.map(x => (x,1)).reduce( _+_).collect | |
BTW reduceByKey is a Transformation and reduce is an Action | |
//try countByKey | |
lrdd.map(x => (x,1)).countByKey | |
//countByKey and reduceByKey the same? no, reduceByKey more powerful | |
lrdd.map(x => (x,1)).reduceByKey( _*_).collect | |
//sort by key asc/desc | |
lrdd.map(x => (x,1)).sortByKey(false).collect | |
lrdd.map(x => (x,1)).sortByKey(true).collect |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment