Skip to content

Instantly share code, notes, and snippets.

@stevekrouse
Last active December 17, 2015 15:29
Show Gist options
  • Save stevekrouse/5631853 to your computer and use it in GitHub Desktop.
Save stevekrouse/5631853 to your computer and use it in GitHub Desktop.
class ScaldingAnagrams {
val input = TextLine("data/nytime_1899-2012")
val output = TextLine("data/anagrams")
//Mappers
def tokenizeWords(s: String): array[String] = StringUtils.split(s, "\n\t")
def makeAnagramHash(s: String): String = new String(s.toCharArray.sorted)
//Reduce
def combineAnagrams(gb: GroupBuilder): GroupBuilder = gb.sortBy('word).mkString('word -> 'words, ", ")
input.
read.
flatMatp('line -> 'word)(tokenizeWords).
map('word -> 'hash)(makeAnagramHash).
groupBy('hash)(combineAnagrams)/
project('words).
write(output)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment