Skip to content

Instantly share code, notes, and snippets.

@SriramKeerthi
Created January 11, 2016 15:31
Show Gist options
  • Save SriramKeerthi/7934e56dc93846917e8e to your computer and use it in GitHub Desktop.
Save SriramKeerthi/7934e56dc93846917e8e to your computer and use it in GitHub Desktop.
Word count using Scala
import org.apache.spark.{SparkConf, SparkContext}
object WordCount extends App {
val conf = new SparkConf().setAppName("WordCount").setMaster("local[*]")
val sc = new SparkContext(conf)
val rdd = sc.textFile(args(0))
val counts = rdd
.flatMap(line => line.split("[ _\\-'\",<>\\.]"))
.filter(_.length > 0)
.map(word => (word.toLowerCase(), 1))
.reduceByKey(_ + _)
.collect.toList.sortBy(kv => -kv._2)
println(counts)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment