Skip to content

Instantly share code, notes, and snippets.

@fwbrasil
Last active November 6, 2017 20:43
Show Gist options
  • Save fwbrasil/c14bb1ead11cc78ca07a75b041030029 to your computer and use it in GitHub Desktop.
Save fwbrasil/c14bb1ead11cc78ca07a75b041030029 to your computer and use it in GitHub Desktop.
def topHashtags(tweets: Dataset[Tweet], n: Int): Dataset[(String, Long)] =
run { // produce a dataset from the Quill query
liftQuery(tweets) // trasform the dataset into a Quill query
.concatMap(_.text.split(" ")) // split into words and unnest results
.filter(_.startsWith("#")) // filter hashtag words
.map(_.toLowerCase) // normalize hashtags
.groupBy(word => word) // group by each hashtag
.map { // map word list to its count
case (word, list) =>
(word, list.size)
}
.sortBy { // sort by the count desc
case (word, count) => -count
}
.take(lift(n)) // limit to top results
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment