Skip to content

Instantly share code, notes, and snippets.

@cigrainger
Created May 19, 2015 15:30
Show Gist options
  • Save cigrainger/c4ac2b87f84cad40c100 to your computer and use it in GitHub Desktop.
Save cigrainger/c4ac2b87f84cad40c100 to your computer and use it in GitHub Desktop.
val documents: RDD[(Long, Vector)] =
tokenized.zipWithIndex.map { case (tokens, id) =>
val counts = new mutable.HashMap[Int, Double]()
tokens.foreach { term =>
if (vocab.contains(term)) {
val idx = vocab(term)
counts(idx) = counts.getOrElse(idx, 0.0) + 1.0
}
}
(id, Vectors.sparse(vocab.size, counts.toSeq))
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment