Skip to content

Instantly share code, notes, and snippets.

@fwbrasil
Last active November 6, 2017 20:42
Show Gist options
  • Save fwbrasil/31d706cc010e3132c4d7847373d141c6 to your computer and use it in GitHub Desktop.
Save fwbrasil/31d706cc010e3132c4d7847373d141c6 to your computer and use it in GitHub Desktop.
def topHashtags(tweets: RDD[Tweet], n: Int): Array[(String, BigInt)] =
tweets
.flatMap(_.text.split("\\s+")) // split it into words
.filter(_.startsWith("#")) // filter hashtag words
.map(_.toLowerCase) // normalize hashtags
.map((_, BigInt(1))) // create tuples for counting
.reduceByKey((a, b) => a + b) // accumulate counters
.top(n)(Ordering.by(_._2)) // return ordered top hashtags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment