Skip to content

Instantly share code, notes, and snippets.

@prateek
Created October 29, 2015 01:17
Show Gist options
  • Save prateek/c6f376fb56064f720666 to your computer and use it in GitHub Desktop.
Save prateek/c6f376fb56064f720666 to your computer and use it in GitHub Desktop.
// once upon a time, what did I see, but two RDDs
val tableA = sc.parallelize(List((1,2), (3,4), (5,6)))
val tableB = sc.parallelize(List((1,'A'), (3,'B'), (5,'C')))
// one of them small enough to fit into map
val mapTableB = tableB.collectAsMap
// which everyone could read
val broadcastB = sc.broadcast(mapTableB)
// and join for all eternity
val mapJoin = tableA.map({case (id, value) => (id, value, broadcastB.value.get(id))})
// q.e.d.
mapJoin.collect.map(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment