Skip to content

Instantly share code, notes, and snippets.

View MLnick's full-sized avatar

Nick Pentreath MLnick

  • Cape Town, South Africa
  • X @MLnick
View GitHub Profile
@MLnick
MLnick / StreamingCMS.scala
Created February 13, 2013 15:00
Spark Streaming with CountMinSketch from Twitter Algebird
import spark.streaming.{Seconds, StreamingContext}
import spark.storage.StorageLevel
import spark.streaming.examples.twitter.TwitterInputDStream
import com.twitter.algebird._
import spark.streaming.StreamingContext._
import spark.SparkContext._
/**
* Example of using CountMinSketch monoid from Twitter's Algebird together with Spark Streaming's
* TwitterInputDStream
@MLnick
MLnick / StreamingHLL.scala
Last active January 24, 2024 19:39
Spark Streaming meets Algebird's HyperLogLog Monoid
import spark.streaming.StreamingContext._
import spark.streaming.{Seconds, StreamingContext}
import spark.SparkContext._
import spark.storage.StorageLevel
import spark.streaming.examples.twitter.TwitterInputDStream
import com.twitter.algebird.HyperLogLog._
import com.twitter.algebird._
/**
* Example of using HyperLogLog monoid from Twitter's Algebird together with Spark Streaming's
@MLnick
MLnick / sklearn-lr-spark.py
Created February 4, 2013 14:29
SGD in Spark using Scikit-learn
import sys
from pyspark.context import SparkContext
from numpy import array, random as np_random
from sklearn import linear_model as lm
from sklearn.base import copy
N = 10000 # Number of data points
D = 10 # Numer of dimensions
ITERATIONS = 5