Skip to content

Instantly share code, notes, and snippets.

@idris75
Forked from jamesrajendran/reduceVsreduceByKey
Created March 2, 2020 13:37
Show Gist options
  • Save idris75/d12aeb0f12d258ec5e21bfaafab8f188 to your computer and use it in GitHub Desktop.
Save idris75/d12aeb0f12d258ec5e21bfaafab8f188 to your computer and use it in GitHub Desktop.
Demystifying reduce and reduceByKey
val lrdd =sc.parallelize( List(1,2,3,4,5,3,5))
lrdd.reduce((a,b) => a+b)
//short version of the syntax below
lrdd.reduce( _+_)
// try changing reduce to reduceByKey, which will fail
// as reduceByKey is applicable to key-value pairs
lrdd.reduceByKey( _+_)
//convert/map into key-value pair and try reduceByKey
lrdd.map(x => (x,1)).reduceByKey( _+_).collect
//try reduce on key-value pair, which will fail
lrdd.map(x => (x,1)).reduce( _+_).collect
BTW reduceByKey is a Transformation and reduce is an Action
//try countByKey
lrdd.map(x => (x,1)).countByKey
//countByKey and reduceByKey the same? no, reduceByKey more powerful
lrdd.map(x => (x,1)).reduceByKey( _*_).collect
//sort by key asc/desc
lrdd.map(x => (x,1)).sortByKey(false).collect
lrdd.map(x => (x,1)).sortByKey(true).collect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment