Skip to content

Instantly share code, notes, and snippets.

@jamesrajendran
Last active March 2, 2020 13:37
Show Gist options
  • Save jamesrajendran/b48de4e63df3fdfe942b9a42021beefd to your computer and use it in GitHub Desktop.
Save jamesrajendran/b48de4e63df3fdfe942b9a42021beefd to your computer and use it in GitHub Desktop.
Demystifying reduce and reduceByKey
val lrdd =sc.parallelize( List(1,2,3,4,5,3,5))
lrdd.reduce((a,b) => a+b)
//short version of the syntax below
lrdd.reduce( _+_)
// try changing reduce to reduceByKey, which will fail
// as reduceByKey is applicable to key-value pairs
lrdd.reduceByKey( _+_)
//convert/map into key-value pair and try reduceByKey
lrdd.map(x => (x,1)).reduceByKey( _+_).collect
//try reduce on key-value pair, which will fail
lrdd.map(x => (x,1)).reduce( _+_).collect
BTW reduceByKey is a Transformation and reduce is an Action
//try countByKey
lrdd.map(x => (x,1)).countByKey
//countByKey and reduceByKey the same? no, reduceByKey more powerful
lrdd.map(x => (x,1)).reduceByKey( _*_).collect
//sort by key asc/desc
lrdd.map(x => (x,1)).sortByKey(false).collect
lrdd.map(x => (x,1)).sortByKey(true).collect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment