Last active
November 16, 2017 19:32
-
-
Save fwbrasil/2892681396c84be6aad90dd9b12d553a to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def topHashtags(tweets: Dataset[Tweet], n: Int): Dataset[(String, BigInt)] = | |
tweets | |
.select($"text".as[String]) // select the text column (Dataframe) | |
.flatMap(_.split("\\s+")) // split it into words (Dataset) | |
.filter(_.startsWith("#")) // filter hashtag words (Dataset) | |
.map(_.toLowerCase) // normalize hashtags (Dataset) | |
.groupBy($"value") // group by each hashtag (Dataframe) | |
.agg(count("*") as "count") // aggregate the count (Dataframe) | |
.orderBy($"count" desc) // order (Datafeame) | |
.limit(n) // limit to top results (Dataframe) | |
.as[(String, BigInt)] // set the type again (Dataset) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Line 9:
=~ s/DataFeame/DataFrame/