Skip to content

Instantly share code, notes, and snippets.

@v6ak
Created April 8, 2013 07:51
Show Gist options
  • Select an option

  • Save v6ak/5335004 to your computer and use it in GitHub Desktop.

Select an option

Save v6ak/5335004 to your computer and use it in GitHub Desktop.
Trigram frequencies in Scala. Can you make it even shorted?
io.Source.fromFile("/tmp/text").mkString.split("""[\?\.\!",\- \s]+""").sliding(3).toSeq.groupBy(_.toSeq).toSeq.map{x=>x._2.size->x._1.mkString(" ")}.sorted.reverse
val s = io.Source.fromFile("/tmp/text").mkString
val words = s.split("""[\?\.\!",\- \s]+""")
val ngrams = words.sliding(3).toSeq
val frequencies = for((words, occurences) <- ngrams.groupBy(_.toSeq).toSeq) yield occurences.size -> words.mkString(" ")
val sortedFrequencies = frequencies.sorted(Ordering.by((_: (Int, String))._1).reverse)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment