Skip to content

Instantly share code, notes, and snippets.

@natbusa
Last active August 29, 2015 13:57
Show Gist options
  • Save natbusa/9785354 to your computer and use it in GitHub Desktop.
Save natbusa/9785354 to your computer and use it in GitHub Desktop.
Word count inscalding
class WordCount(args : Args) extends Job(args) {
TextLine(args("input"))
.read
.flatMap('line -> 'word){ line : String => line.split("\\s")}
.groupBy('word){group => group.size}
.write(Tsv(args("output")))
}
$HADOOP_HOME/bin/hadoop \
jar target/scala-2.10/wordcount-scalding-assembly-0.8.11.jar WordCount \
--hdfs \
--input wordcount-input/lorem.txt \
--output $HADOOP_OUTPUT_DIR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment