Skip to content

Instantly share code, notes, and snippets.

View tnb115's full-sized avatar

Thomas Bredillet tnb115

View GitHub Profile
@mathiasbynens
mathiasbynens / README.md
Last active March 4, 2025 11:35
Superfish certificate
@waleking
waleking / SparkGibbsLDA.scala
Last active January 31, 2020 11:15
We implement gibbs sampling for LDA by Spark. This version performs much better than alpha version, and now can handle 3196204 words, 100 topics, 1000 sample iterations on server in 161.7 minutes. To solve the long time consuming in collect() process in alpha version, we utilize the cache() method as line 261 and line 262. We also solve a pile o…
package topic
import spark.broadcast._
import spark.SparkContext
import spark.SparkContext._
import spark.RDD
import spark.storage.StorageLevel
import scala.util.Random
import scala.math.{ sqrt, log, pow, abs, exp, min, max }
import scala.collection.mutable.HashMap