Skip to content

Instantly share code, notes, and snippets.

@kaja47
Created August 9, 2012 00:29
Show Gist options
  • Select an option

  • Save kaja47/3299954 to your computer and use it in GitHub Desktop.

Select an option

Save kaja47/3299954 to your computer and use it in GitHub Desktop.
Statistical text generator
// you need Scala 2.10 to run this
val text = ???
// we are interested in bi-grams
val n = 2
// map from ngrams to seq of following words
val map = collection.mutable.Map[Seq[String], Seq[String]]() withDefaultValue Seq()
val words = text.toLowerCase split "(?U)\\W+" toVector
for (ngram :+ next <- words sliding n+1)
map(ngram) = map(ngram) :+ next
// generate text
var firstNgram, lastNgram = Seq("anomalocaris", "detrimentum", "něco") take n
val generatedWords = for (i <- 1 to 5000) yield {
map get lastNgram match {
case Some(nextWords) =>
val idx = util.Random.nextInt(nextWords.size)
val next = nextWords(idx)
lastNgram = lastNgram.tail :+ next
Some(next)
case None =>
None
}
}
var length = 0
for (w <- firstNgram ++ generatedWords.flatten) {
length += w.length + 1
if (length > 140) {
length = w.length + 1
println()
}
print(w+" ")
}
println()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment