Skip to content

Instantly share code, notes, and snippets.

@zezutom
Created December 20, 2015 20:39
Show Gist options
  • Save zezutom/f4d214e92e8867a8814b to your computer and use it in GitHub Desktop.
Save zezutom/f4d214e92e8867a8814b to your computer and use it in GitHub Desktop.
Accumulators as counters
class TextAnalyser(val sc: SparkContext, ...) {
...
// Instance variables
val _totalChars = sc.accumulator(0, "Total Characters")
val _totalWords = sc.accumulator(0, "Total Words")
...
// In a worker thread
def analyse(rdd: RDD[String]): TextStats = {
...
// This limits serialization to the reference variables only
val totalChars = _totalChars
val totalWords = _totalWords
...
// Accumulators can be safely used in parallel computations
.map(x => {totalWords += 1; x})
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment