Skip to content

Instantly share code, notes, and snippets.

@chrislewis
Created March 2, 2010 01:37
Show Gist options
  • Select an option

  • Save chrislewis/319034 to your computer and use it in GitHub Desktop.

Select an option

Save chrislewis/319034 to your computer and use it in GitHub Desktop.
/**
* Given a sequence of words (Array, List, etc), generate a Map[String, Int]
* of the words to how many times they appear in the sequence, filtering any
* stop words.
*/
def countWords(w: Seq[String], s: Seq[String]) =
w.filter(! s.contains(_)).foldLeft(Map[String, Int]()) { (m, w) =>
m + {
if(m.isDefinedAt(w)) w -> (m(w) + 1)
else w -> 1
}
}
val words = "web designer to the max web me sideways i love web i am a designer".split(" ")
val stopWords = "and the to a it is i am me".split(" ")
val totals = countWords(words, stopWords)
//totals: scala.collection.immutable.Map[String,Int] = Map(web -> 3, sideways -> 1, love -> 1, max -> 1, designer -> 2)
// If you want to sort them by appearance frequency:
val frequencyList = totals.toList.sort(_._2 > _._2)
// If you want to create a string of the 3 most popular terms:
frequencyList.take(3).foldLeft("")(_ + " " + _._1)
//res0: java.lang.String = web designer max
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment