Last active
April 18, 2017 22:20
-
-
Save cjauvin/68314066fa674c6c2550 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val pairs = sc.parallelize(List(("aa", 1), ("bb", 2), | |
("aa", 10), ("bb", 20), | |
("aa", 100), ("bb", 200))) | |
/* aggregateByKey takes an initial accumulator (here an empty list), | |
a first lambda function to merge a value to an accumulator, and a | |
second lambda function to merge two accumulators */ | |
pairs.aggregateByKey(List[Any]())( | |
(aggr, value) => aggr ::: (value :: Nil), | |
(aggr1, aggr2) => aggr1 ::: aggr2 | |
).collect().toMap | |
// scala.collection.immutable.Map[String,List[Any]] = | |
// Map(aa -> List(1, 10, 100), bb -> List(2, 20, 200)) | |
/* combineByKey is even more general in that it adds an initial lambda | |
function to create the initial accumulator */ | |
pairs.combineByKey( | |
(value) => List(value), | |
(aggr: List[Any], value) => aggr ::: (value :: Nil), | |
(aggr1: List[Any], aggr2: List[Any]) => aggr1 ::: aggr2 | |
).collect().toMap | |
// scala.collection.immutable.Map[String,List[Any]] = | |
// Map(aa -> List(1, 10, 100), bb -> List(2, 20, 200)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I don't combineByKey is more general, it's more like a generalized version of reduce, while aggregateByKey is a generalized version of fold.