Skip to content

Instantly share code, notes, and snippets.

@fwbrasil
Last active November 6, 2017 20:42
Show Gist options
  • Save fwbrasil/b56d1a8589447032cb2db0cda4550400 to your computer and use it in GitHub Desktop.
Save fwbrasil/b56d1a8589447032cb2db0cda4550400 to your computer and use it in GitHub Desktop.
def topHashtags(tweets: DataFrame, n: Int): DataFrame =
tweets
.select(explode(split($"text", "\\s+"))) // split it into words
.select(lower($"col") as "word") // normalize hashtags
.filter("word like '#%'") // filter hashtag words
.groupBy($"word") // group by each hashtag
.agg(count("*") as "count") // aggregate the count
.orderBy($"count" desc) // order
.limit(n) // limit to top results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment