Skip to content

Instantly share code, notes, and snippets.

View manboubird's full-sized avatar

Toshiaki Toyama manboubird

View GitHub Profile
// transcribed from an Apache Spark 1.0 spark-shell session
// using data from http://chriswhong.com/open-data/foil_nyc_taxi/
// and the QTree algorithm for approximate quantiles over large datasets
// each of the distanceRange and minutesRange calculations below takes about 15 minutes on my four-core SSD-based Macbook Pro
import com.twitter.algebird._
import com.twitter.algebird.Operators._
implicit val qtSemigroupD = new QTreeSemigroup[Double](6)
val in = sc.textFile("trip_data") // a directory containing all the trip_data*.csv files downloaded from the above link
#!/usr/bin/ruby
class IPGenerator
public
def initialize(session_count, session_length)
@session_count = session_count
@session_length = session_length
@sessions = {}
end